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FOREWORD 


The  Fourth  Army  Conference  on  Applied  Mathematics  and  Computing  was  held  27-30 
May  1986  at  Cornell  University,  Ithaca,  New  York,  It  coincided  with  the 
formal  opening  of  the  recently  established  Mathematical  Sciences  Institute 
(MSI).  This  meeting's  seven  Invited  speakers  addressed  the  vital  areas  of 
combustion,  computational  fluid  dynamics,  parallel  computation,  stochastic 
analysis,  multiple  bifurcation,  numerical  solutions  of  partial  differential 
equations  and  problems  in  many  scales  of  length  and  time  in  modern  computing 
enviroments.  There  were  two  special  sessions  that  dealt  with  Stochastic 
Algorithms  and  Computational  Vision,  and  Probabilistic  Methods  In  Solid 
Mechanics.  The  one  hundred  and  eight  contributed  technical  papers  covered 
nearly  the  entire  spectrum  of  basic  research.  During  the  course  of  the 
meeting  several  synergetic  relationships  developed,  and  the  feedback  from  the 
Army  scientists  was  very  positive. 

As  in  previous  meetings,  this  meeting  provided  its  attendees  a  chance  to  see 
the  many  scientific  developments  taking  place  in  various  Army  laboratories. 
Through  these  meetings,  techniques  developed  at  one  installation  are  brought 
to  the  attention  of  scientists  at  other  places,  thus  reducing  duplication  of 
effort.  Another  important  phase  of  these  meetings  is  presenting  the  members 
of  the  audience  an  opportunity  to  hear  nationally  known  scientists  discuss 
recent  developments  of  their  own  fields.*-  This  year  the  invited  speakers 
together  with  the  titles  of  their  addresses  are  listed  below.  These  gentlemen 
were  more  than  willing  to  discuss  various  problems  of  special  Interest  to 
scientists  in  the  Army  agencies. 

SPEAKERS  AND  AFF RATION 

Professor  K.  G.  Wilson 
Cornell  University 

Professor  Richard  Ewing 
University  of  Wyoming 

Professor  John  Guckenheimer 
Cornell  University 

Professor  Eugene  Wong 
University  of  California,  Berkeley 

Professor  Richard  Karp 
University  of  California,  Berkeley 

Professor  A.  F.  Ghoniem 
Massachusetts  Institute  of  Technology 

Professor  A.  J.  Majda 
Princeton  University 


TITLES  OF  ADDRESS 

Renormalization  Groups  and  Problems 
in  Many  Scales  of  Length 

Numerical  Solution  of  Partial 
Differential  Equations 

Multiple  Bifurcation 

Stochastic  Differencial  Forms 

The  Complexity  of  Parallel 
Computation 

Computing  Unsteady  Reacting  Flows 
Using  Vortex  Methods 

High  Mach  Number  Combustion 


I 


The  benefits  derived  from  these  conferences  depend  a  great  deal  on  the  host's 
Chairman  on  Local  Arrangements.  The  attendees  at  this  meeting  were  fortunate 
to  have  Professor  G.  S.  S.  Ludford,  Director  of  the  MSI,  serving  in  this 
capacity.  He,  together  with  members  of  his  capable  staff,  provided  all  those 
things,  such  as  projection  equipment,  travel  information,  etc.,  needed  for  an 
enjoyable  and  profitable  symposium. 

The  Amy  Mathematics  Steering  Committee  is  the  sponsor  of  these  Army 
Conferences  on  Applied  Mathematics  and  Computing.  The  members  of  this 
committee  were  pleased,  not  only  with  the  large  number  of  contributed  papers, 
but  also  with  the  scientific  quality  of  these  papers.  They  are  also  pleased 
to  be  able  to  provide  the  transaction  of  this  conference.  It  is  hoped  the 
scientific  ideas  contained  therein  will  benefit  not  only  those  who  were  able 
to  attend  the  symposium,  but  also  many  others  that  did  not  enjoy  that 
privilege. 


iv 


TABLE  OF  CONTENTS 


Title  Page 


Foreword  .  iii 

Table  of  Contents  .  v 

Program .  xl  1 1 

Interactions,  Bifurcations,  and  Instabilities 
of  Hydrodynamic  Surfaces 
Craig  Fithian,  Carl  L.  Garder,  James  Gliutn, 

John  Grove,  Oliver  McBryan,  John  Scheuentan, 

Ralph  Menikoff,  and  David  H.  Sharp . . .  1 

Nonlinear  Viscoelastic  fteterials  With  Fading  Memory 
John  A.  Nohel  . 21 

Recent  Developments  in  Nonstrictly  Hyperbolic  Conservation  Laws 
Michael  Shearer  .  43 

Applications  of  Matrix  Factorization  in  Hydrodynamic  Stability 

Phillip  J.  Morris  .  53 

A  Numerical  Study  of  the  Effect  of  Curvature  on  Detonation  Speed 

B.  Bukiet  and  J.  Jones .  57 

Pressure  Transients  in  a  Cavity  Due  to  Impulsive  Loads 

C.  Helleur  B.  Tabarrok  and  R.  G.  Fenton  .  83 

Computation  of  Weight  Functions  in  Two  Dimensional 
Anisotropic  Bodies 

T.  L.  Sham .  95 

Micromechanics  of  Shear  Banding  in  High  Strength  Steel 

Dennis  M.  Tracey,  Colin  E.  Freese,  and  Paul  J.  Perrone .  103 

Fractals,  Fragmentation,  and  Failure 

Donald  L.  Turcotte .  117 

Numerical  Solution  to  a  System  of  Random  Volterra  In  tergal 
Equations 

N.  Medhin,  M.  Sambandham  and  C.K.  Zoltani .  123 

approximate  Methods  for  Structural  Reliability 

Miroea  Grigoriu  and  Arnold  Buss  .  143 


♦This  Table  of  Contents  lists  only  the  papers  that  are  published  in  the 
Technical  Manual.  For  a  list  of  all  the  papers  presented  at  the  Fourth 
Army  Conference  on  Applied  Mathematics  and  Ccnputing,  see  the  agenda. 


v 


Title 


Page 


The  Theory  of  Randan  Wave  Operators 

Marc  A.  Bergen  .  159 

Characteristic  Functions  of  a  Class  of  Probability 
Distributions 

Siegfried  H.  Legnigk  .  181 

Poisson  and  Extreme  Value  Limit  for  Jfcrkov  Randan  Fields 

Simeon  M.  Berman  .  183 

a  Bound  on  the  Variation  Between  Two  Probability  Measures 
in  Terms  of  the  Intensities  of  a  Discrete  Point  Process 
Relative  to  These  Probabilities 

G.R.  Andersen  .  185 

Sane  Problems  of  Estimation  Fran  Poisson  Type  Counting 
Professes 

Michael  J.  Phelan  .  197 

A  Hierarchial  Multiscale  Processing  of  Images 

B.  Gidas  .  215 

A  Maximum  Entropy  Method  for  Expert  Systems  Construction 

Alan  Lippman  .  227 

Probabilistics  Finite  Elements  and  Potential  Application 
to  Fracture 

Wing  Kam  Liu,  Ted  Belytschko,  Glen  Besterf ield,  and 

A.  Mani  .  249 

Limit  Theorems  for  the  Size  Effect  in  the  Lifetime 
Distribution  of  a  Fibrous  Composite 

S.  Leigh  Phoenix  and  Chia-Chyuan  Kuo .  267 

Phase  Space  Methods  and  Path  Integration:  A  Miscroscopic 
Approach  to  Direct  and  Inverse  Wave  Propagation 

Louis  Fishman  .  289 

Scale  Invariant  Equations  for  Relativistic  Waves 
Richard  A.  Weiss  .  307 

Relativistic  Wave  Equations  for  Real  Gases 
Richard  A.  Weiss  . 341 

Hamiltonian  Deformations  of  Integrable,  Nonlinear 
Field  Equations 

C. R.  Menyuk,  P.K.A.  Wai,  H.H.  Chen,  and  Y.C.  Lee  .  373 

The  Effects  of  Boundary  Conditions  of  Electromagnetic 
Pulses 

K.C.  Heaton  .  387 

vi 


Title 


Page 


On  Fatigue  Life  Prediction  in  Thick-Walled  Cylinders 

S.L.  Pu  and  P.C.T.  Chen .  429 

Analysis  of  Composite  Shrink  Fits  -  Tresca  Material 

Peter  C.T.  Chen .  443 

A  Shallowly  Curved  Shear-Deformable  Beam  Element 
Alexander  Tessler  and  Luciano  Spiridigliozzi  .  455 

Admissible  Elastic  Energy  Density  Functions  for 
Elastomer  Solids 

I.  Fried .  479 

Solutions  of  the  Transonic  Flew  Equations  By 
Spectral  Methods 

Patrick  Hanley,  Cathy  Mavriplis  and  Wesley  L.  Harris .  493 

A  Toolkit  of  Symbol  mnipulation  Programs  for 
Variational  Grid  Generation 

Stanly  Steinberg  and  Patrick  J.  Roache  .  515 

A  Self-Adaptive  Gridding  for  Inviscid  Transonic 
Projectile  Aerodynamics  Computations 
Chen-Chi  Hsu  and  Chyuan-Gen  Tu  .  533 

Ch  Confutation  of  Transonic  Projectile  Aerodynamics 

Chen-Chi  and  Nae-Haur  Shiau .  535 

Numerical  Simulation  of  Supersonic  Flow  Over  a 
Rotating  Band 

Jabaraj  Sahu  .  543 

Improved  Numerical  Prediction  of  Transonic  Flew 
Jabaraj  Sahu  and  Charles  J.  Nietubicz .  561 

Numerical  Solution  of  Sytems  of  Partial  Differential 
Equations 

Richard  E.  Ewing  .  583 

Asympototic  Stability  of  Viscous  Shock  Waves 
F.  A.  Hcwes .  597 

Extensions  of  Sarkavoskii '  s  Theorem 
Nam  P.  Bhatia  .  605 

Poincare  Jfaps  of  a  Journal  Bearing 
P.J.  Hollis  and  D.L.  Taylor .  611 

Analytical  and  Computational  Studies  of  the  Fluid 
Motion  in  Liquid-Filled  Shells 

Thorwald  Herbert  . 627 


vii 


Title 


Page 


The  Evolution  of  Subharmonic  Edge  Wavepackets 
on  a  Sloping  Beach 

T.R.  Akylas  and  S.  Knopping .  639 

A  Unified  Approach  to  Mass  Property  Computations  in  a 
Solid  Modeling  Envircment  With  Application  to  Hydraulic 
Structures 

Fred  T.  Tracy .  641 

A  Canmonsense  Theory  of  Nonmono tonicity 

Frank  M.  Brown  .  653 

Miltiobjective  A*:  A  Complete  and  Acknissible 
Search  Algorithm 

Bradley  S.  Steward  and  Chelsea  C.  White,  III  .  689 

Mathematical  Basis  for  Expert  Reasoning 
Forouzan  Golshani  .  709 

Toward  Optimal  Feature  Selection  s  Past,  Present 
and  Future 

Wojciech  Siedlecki  and  Jack  Sklansky  .  721 

Introducing  Treatments  Into  Test  Procedures 
D.W.  Loveland .  731 

On  the  Errors  That  Learning  ftechines  Will  Make 

A. W.  Biermann,  K.C.  Gilbert,  A.  Fabry,  and 

B.  Foster  .  739  . 

A  Model  of  Decision  Making  With  Sequential  Information- 
Acquisition  With  Application  to  the  File  Search  Problem 
James  C.  Moore,  William  Richmond  and  Andrew  B.  Whinston  ....  741 

airmen ts  on  Multiple  Bifurcations 

John  Guckenheimer  . 773 

Measures  of  Block  Design  Efficiency  Recovering  Interblock 
Information 

Walter  T.  Federer,  and  Terry  P.  Speed .  781 

Computing  Asympototic  Confidence  Bands  for  Nonlinear 
Regression  Models 

John  J.  Peterson  .  787 

Testing  Curve  Fit 

Royce  Soanes . 797 

On  the  Estimation  of  Sane  Network  Parameters  in  the 
Pert  Model  of  Activity  Networks 

Salah  E.  Elmaghraby .  809 

vl  1 1 


Title 


Page 


Solidification  and  Melting  With  Interfacial  Energy 
and  Entropy 

Morton  E.  Gurtin  .  817 

Numerical  Computation  of  the  Approximate  Analytical 
Solution  of  a  Stefan's  Problem  in  a  Finite  Domain 
Shunsuke  Takagi  .  823 

Thin  Film  Conductive  Coating  for  Surface  Heating 
and  Decontamination 

S.S.  Sadhal,  P.S.  Ayyaswamy,  and  Arthur  K.  Stuempfle .  833 

The  Poiseuille  Flow  of  a  Particle-Fluid  Mixture — 

Effective  Viscosity 

Donald  A.  Drew .  863 

Some  Remarks  on  Blow-Up  in  the  Stefan  Model  for 
Phase  Transitions  and  the  Hele-Shaw  Problem 
S.D.  Howison  .  873 

Global  Optimization  Using  Automatic  Differentiation 
and  Interval  Iteration 

L. B.  Rail  .  881 

Computing  K-Terminal  Reliability  in  Time 

Polynomial  in  the  Number  of  (Sf  K) -Quasicuts  .  901 

Weak  Greedy  Heuristics  for  Perfect  Matching* 

M. D.  Grigoriadis,  B.  Kalantari,  and  C.Y.  Lai .  909 

Sensitivity  Analysis  for  Stationary  Probabilities 
of  Markov  Chains 

Peter  W.  Glynn  .  917 

Stochastic  Differential  Forms 

Eugene  Wong  .  933 

Filtering  and  Control  for  Wide  Bandwidth  Noise 
and  'Nearly'  Linear  Systems 

H.  J.  Kushner  and  W.  Runggaldier  .  943 

Adaptive  Kalnan  Filtering  for  Instrumentation  Radar 
Charles  K.  Chui  and  Robert  E.  Green  .  953 

Optimal  Impulse  -  Correction  of  a  Randan  Linear 
Oscillator 

P.L.  Chew  and  J.L.  Menaldi .  979 

EUlse  -  Arrival  Time  for  Waves  in  Turbulent  Media 

P.L.  Chew  and  J.L.  Menaldi .  993 


ix 


Title 


Page 


The  Transition  from  Phase  Locking  to  Drift  in 
a  System  of  Two  Weakly  Coupled  Van  Der  Pol  Oscillators 
Tapesh  Chakroborty  and  Richard  H.  Rand .  1003 

Design  and  Implementation  of  a  Mulitvariate  Adaptive 
Control  System  for  Aircraft/  Weapon  Applications 
Pak  T.  Yip  and  David  Ngo  .  1019 

Dorrain  Contrations  in  Finite-Difference  Confutations  of 
Poisson's  Equation  by  Means  of  Infinite  Network  Theory 
A.H.  Zenanian  . . .  1029 

Upwind  Differencing  and  MHD  Equations 
M.  Brio,  C.C.  Wu,  A.  Harten,  and  S.  Osher  .  1047 

Finite  Difference  Methods  for  Polar 
Coordinate  Systems 

John  C.  Strikwerda  and  Yvonne  M.  Nagel  . . .  1059 

Adaptive  Finite  Element  Methods  for  Parabolic 
Systems  in  Che  and  Two  Space  Dimensions 
Slimane  Adjerid  and  Joseph  E.  Flaherty  .  1077 

A  Posteriori  Error  Estimation  in  A  Finite  Elements 
Method  for  Parabolic  Partial  Differential  Equations 

J.M.  Coyle  and  J.E.  Flaherty . . .  1099 

An  Adaptive  Method  With  Mesh  Moving  and  Local  Mesh 
Refinement  for  Time-Dependent  Partial  Differential  Equations 
David  C.  Amey  and  Joseph  E.  Flaherty .  1115 

Vortex  Fission  and  Fusion 

Karl  Gustafson  .  1143 

Incipient  Singularities  in  the  Wavier-Stokes 
Equations 

Alain  Pumir  and  Eric  D.  Sigga . . .  1153 

Finite  Element  Approximation  of  a  Reaction- 
Diffusion  Equation 

Sat  Nam  S.  Khalsa  . . .  1157 

High  Resolution,  Minimal  Storage  Algorithms  for 
Convection  Dominated,  Convection-Diffusion  Equations 
V.  Ervin  and  W.  Layton  .  1173 

A  Plane  Premixed  Flame  Problem  With  Two-Step  Kinetics: 

Existence  and  Stability  Questions 

C.  Schmidt-Laine '  .  1203 


x 


Title 


Page 


Controlling  Therm  1  Runaway  in  Catalytic  Pellets 
Jagdish  Chandra  and  Paul  Davis  . . .  1209 

Propagation  of  a  Plane,  Adiabatic  Flame  Through  a 
Mixture  With  a  Tenporal  Enthalphy  Gradient 
A.K.  Kapila  and  G.  Ledder  .  1215 

A  2-Dimensional  Scalar  Chandrasekhar  Filter  for 
Image  Restoration 

A.K.  Mahalanabis  and  Kefu  Xue  .  1227 

Object  Tracking  Using  Sensor  Fusion 

Firooz  A.  Sadjadi  and  Michael  E.  Bazakos  .  1241 

Random  Field  Identification  From  a  Sanple 
Millu  Rosenblatt-Roth  .  1249 

Approximation  of  Two-Dimensional  Random  Fields 
Millu  Rosenblatt-Roth  .  1255 

Interpolation  by  Bivariate  Quadratic  Splines  on  a 
Non-Uniform  Rectangular  Grid 

Charles  Chui,  Harvey  Diamond,  Louise  Raphael  .  1261 

On  the  C2  Continuity  of  Piecewise  Cubic  Hermite 
Polynomials  With  Unequal  Intervals 
C.N.  Shen .  1267 

Views  on  the  Weierstrass  and  Generalizied 
Weierstrass  Functions 

M.F.  Shlesinger,  M.A.  Hussain,  and  J.T.  Bendler  .  1277 

A  Fast  Algorithm  for  the  Multiplication  of  Generalized 
Hilbert  Matrices  With  Vectors 

Apostolos  Gerasoulis  .  1285 

Effect  of  Rotation  of  the  Lateral  Stability  of  a  Free- 
Flying  Column  Subjected  to  an  Axial  Thrust  With 
Directional  Control 

J.D.  Vasilakis  and  J.J.  Wu .  1297 

Detonation  Wave  Initiation  by  Rapid  Energy 

Deposition  at  a  Confining  Boundary . . .  1309 

Interaction  of  Rotating  Band  and  Rifling  in 
Artillery  Projectiles 

S.  Handgud  and  H.P.  Chen . 1313 


xl 


Title  Page 

Using  Superccnputers  Today  and  Tcmorrow 
John  Rice  .  1333 

List  of  Registrants .  1345 


xl  1 


FOURTH  ARMY  CONFFRENCF  ON  APPLIED  MATHEMATICS  AND  COMPUTING 
Cornell  University,  Ithaca,  New  York 
May  27  -  30,  19R6 
AGENDA 


08:00  -  16:00 
08:30  -  09:00 
09:00  -  10:00 


10:00  -  10:30 


Tuesday  May  27,  1986 
Registration  -  Warren  Hall  (WH) 

Opening  Remarks  -  WH  131 
General  Session  I  -  WH  131 

Chairman:  Dr.  San- Li  Pu,  Benet  Weapons  Laboratory, 
Watervliet  Arsenal,  Watervliet,  New  York 

Renormalization  Groups  and  problems  in  many  scales  of  length 

Kenneth  G.  Wilson,  Cornell  University,  Ithaca,  New  York 

Break 


10:30  -  12:30 


10:30  -  10:50 


10:50  -  11:10 


11:10  -  11:30 


11:30  -  11:50 


Technical  Session  TUMI  -  Shock  Waves  -  WH  1-145 

Chairperson:  Dr.  Gary  Carrafano,  Benet  Weapons  Laboratory 
Watervliet  Arsenal,  Watervliet,  New  York 

Interactions,  Bifurcations  and  instabilities  of  hydrodynam¬ 
ic  surfaces 

C,  Fithian,  J.  Glimm,  J.  Grove,  C,  Gardner,  0.  McBryan, 
Courant  Institute  of  Mathematical  Sciences,  New  York,  New 
York 

Developments  of  shock  fronts  in  viscoelastic  materials 

John  A.  Nohel,  Mathematics  Research  Center,  University  of 
Wisconsin,  Madison,  Wisconsin 

Recent  Developments  in  Nonstrictly  Hyperbolic  Conservation 
Laws 

Michael  Shearer,  North  Carolina  State  University,  Raleigh, 
North  Carolina  and  David  G.  Shaeffer,  Duke  University, 
Durham,  North  Carolina 

Applications  of  Matrix  Factorization  in  Hydrodynamic  Stabil¬ 
ity 


xiii 


P.  J.  Morris,  Pennsylvania  State  University,  University 
Park,  Pennsylvania 


1 1 : bo  *  12:10  A  numerical  study  of  the  effect  of  curvature  on  detonation 

speed 

B.  Buklet  and  J.  Jones,  Courant  Institute  of  Mathematical 
Sciences,  New  York,  New  York 

12! 10  -  12:30  Pressure  transients  in  a  cavity  due  to  impulsive  loads 

C.  Helleur,  Valcartler,  Quebec;  B,  Tabarrok  and  R.  6. 
Fenton  University  of  Toronto,  Toronto,  Ontario,  Canada 

! 2 : 30  -  14:00  LUNCH 


10:30  -  12:30 


10:30  -  10:50 


10:50  -  11:10 


1 1 : 1C  -  11:30 


11:30  -  11:50 


11:50  -  12:10 


12:10  -  12:30 


12:30  -  14:00 


Technical  Session  TUM2  -  Solid  Mechanics  A  -  WH  1-101 

Chairperson:  Or.  S.  C.  (Mike)  Chu,  ARDC,  Dover,  New  Jersey 

Singularities  in  three-dimensional  potential  theory  and 
solid  mecharics  and  finite  element  methods  for  their  treat 
ment 

J.  R.  Whiteman,  Brunei  University,  Uxbridge,  U.K. 
Computation  of  weight  functions 

T.  -L.  Sham,  Rensselaer  Polytechnic  Institute,  Troy,  New 
York 

Micromechanics  of  Shearbanding  In  High  Strength  Steel 

Dennis  M.  Tracey,  Colin  E.  Freese  and  Paul  J.  Perron  Army 
Materials  Technology  laboratory,  Watertown,  Massachusetts 

Fractals,  fragmentation  and  failure 

D.  L.  Turcotte,  Cornell  University,  Ithaca,  New  York 

Crack  tip  fields  for  anisotropic  material:  A  finite  ele¬ 
ment  approach 

Roshdy  Barsoum,  Army  Materials  Technology  Laboratory 
Watertown,  Massachusetts 

Numerical  Solution  of  Random  Volterra  Integral  Equations 

Nagash  Medin  and  M.  Sambandham,  Atlanta  University, 

Atlanta ,  Georgia 

LUNCH 


xiy 


j  0 :  ?(  - 

10:30  - 

10:50  - 

11:10  - 

11:30 

11:50 

12:10 

12:30 

14:00 

14:00 

14:15 


]?:30  Technical  <>* '.sion  T UM .*  -  Stochastic  Analysis  -  WH  1-201 

Chairperson:  O'.  Vincent  Mi r el  1  i  ,  Night  Vision  arid 

Electro-Optics  Laboratory,  Fort  Belvoir, 
Virginia 

10:50  Applications  of  non-Gaussian  Processes 

Mi  rcea  Grigori u,  Cornell  University,  Ithaca,  New  York 

11:10  The  method  of  random  characteristics 

Marc  A.  Berger  and  Alan  D.  Sloan,  Georgia  Institute  of 
Technology,  Atlanta,  Georgia 

11:30  Characteristic  Functions  of  a  Class  of  Probability  Distri¬ 
butions 

Siegfried  Lehnigl.,  Army  Missile  Command,  Redstone  Arsenal, 
Alabama 

11:50  Poission  and  extreme  value  limit  theorems  for  Markov  Random 
Fields 

Simeon  M.  Berman,  Courant  Institute  of  Mathematical  Sci¬ 
ences,  New  York,  New  York 

12:10  A  bound  on  the  variation  between  two  probability  measures 
in  terms  of  the  intensities  of  a  discrete  point  process 
relative  to  these  probabilities 

G.  R.  Andersen,  Ballistics  Research  Laboratory  Aberdeen 
Proving  Ground,  Maryland 

12:30  Estimating  the  compensator  from  Poisson  type  counting 

processes 

Michael  Phelan,  Cornell  University,  Ithaca,  New  York 
14:00  Lunch 


17:15  Special  Session  I  -  Stochastic  Algorithms,  Image  Processing, 

and  Computational  Vision  -  WH  1-145 

Chairperson:  Professor  Sanjoy  K.  Mltter,  Massachusetts 
Institute  of  Technology,  Cambridge, 
Massachusetts 

14:15  Opening  Remarks 

15:00  Global  optimization  via  the  cooling  algorithm  and  computa¬ 
tional  complexity 


xv 


15:00  -  15:45 


15:45  -  16:15 
16.15  -  16:45 

16:45  -  17:15 


14:00  -  17:15 


14:00  -  14:15 
14:15  -  14:45 


14:45  -  15:15 

1*:15  -  15:45 


15:45  -  16:15 
16:15  -  16:45 


16:45  -  17:15 


R.  Codas,  Brown  University,  Providence,  Rhode  Island 

On  the  complexity  of  some  stochastic,  search  algorithms 

B.  Hajek,  University  of  Illinois,  Urbana,  Illinois  and 
Massachusetts  Institute  of  Technology,  Cambridge, 
Massachusetts 

Break 

Maximum  Entropy  methods  for  expert  system  construction 
A,  Lippman,  Brown  University,  Providence,  Rhode  Island 
Parameter  estimation  in  probabilistic  models  of  images 
J.  Marroquin,  PEM,  Mexico  City,  Mexico 


Special  Session  II  -  Probabilistic  Methods  in  Solid 

Mechanics  -  WH  1-101 

Chairperson:  Professor  N.  U.  Prabhu,  Cornell  University 
Ithaca,  New  York 

Opening  Remarks 

Probabilistic  finite  elements  and  potential  applications  to 
fracture 

Wing-Kam  Liu  and  Ted  Belytschko,  Northwestern  University, 
Evanston,  Illinois 

On  computing  stress  intensity  factors  wUh  uncertainty 

Ram  Srlvastav,  Army  Research  Office,  Durham,  North  Carolina 

Limit  theorems  for  the  size  effect  in  the  lifetime  distri¬ 
bution  of  a  Fibrous  Composite 

S.  L.  Phoenix,  Cornell  University,  Ithaca,  New  York 
Break 

Approximate  Methods  for  Structural  Reliability 

Mlrcea  Grigori u  and  Arnold  Buss,  Cornell  University  Ithaca, 
New  York 

Critical  strains  for  adiabatic  shear 

Gerald  Moss,  Ballistics  Research  Laboratory,  Aberdeen 
Proving  Ground,  Maryland 


xvi 


19:00  -  22:00 


Banquet.  (Prepaid  participants  only) 

Speaker:  Dr.  Herbert  Hauptman.  President,  and  Research 
Director,  Medical  Foundation  of  Buffalo 


Wednesday  May  28,  1986 

08:00  -  16:00  Registration  -  Warren  Hall  (WH) 

08:30  -  10:30  Technical  Session  WM1  -  Electromagnetics  -  WH  1-201 

Chairperson:  Dr.  Siegfried  Lehnigk,  Army  Missile  Command, 
Redstone  Arsenal,  Alabama 

08:30  -  08:50  Numerical  computation  of  path  integral  representations  of 

scalar  wave  field  propagators 

Louis  Fishman,  Catholic  University,  Washington,  D.  C. 

08:50  -  09:10  Scale  invariant  equations  for  Relativistic  Waves 

Richard  Weiss,  Army  Corps  of  Engineers,  Waterways  Experi¬ 
mental  Station,  Vicksburg,  Mississippi 

09:10  -  09:30  Relativistic  wave  equations  for  Real  Gases 

Richard  Weiss,  Army  Corps  of  Engineers,  Waterways  Experi¬ 
mental  Station,  Vicksburg,  Mississipi 

09:30  -  09:50  Hamiltonian  perturbations  of  the  nonlinear  Schroedinger 

Equation 

Curtis  R.  Menyuk,  University  of  Maryland,  College  Park, 

Ma  ryl  and 

09  :  50  -  10:10  Solutions  of  a  non-integrable  Hamiltonian  system 

P.  K.  A.  Wai ,  C.  R.  Menyuk,  H.  H.  Chen,  and  Y.  C.  Lee 
University  of  Maryland,  College  Park,  Maryland 

10:10  -  10:30  The  effects  of  boundary  conditions  on  Electromagnetic 

pul ses 

K.  C.  Heaton,  Defense  Research  Establishment,  Valcartier, 
Quebec,  Canada 

10:30  -  11:00  Break 


08:30  -  10:30  Technical  Session  WM2  -  Solid  Mechanics  B  -  WH  1-101 

Chairperson:  Dr.  T.  W.  Wright,  Ballistics  Research 

Laboratory,  Aberdeen  Proving  Ground,  Maryland 


xvii 


OP: 30  -  OP: 50 


08:50  -  09:10 


09:10  -  09:30 


09:30  -  09:50 


09:50  -  10:10 


10:10  -  10:30 


10:30  -  11:00 


Discovery  o*  the  elastic  parameters  of  a  layered  half-spao 
Paul  Sacks,  Iowa  State  University,  Ames,  Iowa 
Stability  of  free- free  columns 

Julian  J.  Wu,  Army  European  Research  Office,  London,  UK 

John  D.  Vasilakis,  ARDC  Benet  Weapons  Laboratory  Watervliet 
Arsenal,  Watervliet,  New  York 

On  fatigue  life  prediction  in  thick-walled  cylinders 

S.  L.  Pu  and  P.  C,  T.  Chen,  ARDC  Benet  Weapons  Laboratory 
Watervliet  Arsenal,  Watervliet,  New  York 

Analysis  of  composite  shrink  fits-Tresca  Material 

Peter  C.  T.  Chen,  ARDC  Benet  Weapons  Laboratory,  Watervliet 
Arsenal,  Watervliet,  New  York 

A  shallowly  curved  shear-deformable  beam  element 

A,  Tessler  and  L.  Spi  ridigliozzi  ,  Army  Materials  Technology 
Laboratory,  Watertown,  Massachusetts 

Admissible  elastic  energy  density  functions  for  rubber- like 
solids 

1.  Fried,  Boston  University;  A.  R.  Johnson  and  C.  J. 
Quigley,  Army  Materials  Technology  Laboratory  Watertown, 
Massachusetts 

Break 


08:30  -  10:30 


08:30  -  08:50 


08:50  -  09:10 


Technical  Session  -  WM3  -  Transonic  Flow  -  WH  1-145 

Chairperson:  Dr.  John  Polk,  Ballistics  Research 

Laboratory,  Aberdeen  Proving  Ground  Maryland 

Solutions  of  the  Transonic  Flow  Equations  by  Spectral 
Methods 

P.  Hanley,  C.  Mavriplis,  Massachusetts  Institute  of  Technol¬ 
ogy,  Cambridge,  Massachusetts  and  W.  L.  Harris  University 
of  Connecticut,  Storrs,  Connecticut 

A  toolkit  of  symbol  manipulation  programs  for  variational 
grid  generation 

Stanly  Steinberg,  University  of  New  Mexico,  Albuquerque  and 
Patrick  J.  Roache,  Ecodynamics  Research  Associates, 
Albuquerque,  New  Mexico 


xvili 


09:1(1  -  09:30 


09  :  30  -  09  :  50 


09:50  -  10:10 


10:10  -  10:30 


10:30  -  11:00 


A  self-adaptive  grinding  for  i nvi sc i d  transonic  projectile 
aerodynamic  computation 

Chen- chi  Hsu  and  Chyuan-Gen  In,  University  of  Florida 
Gainesville,  Florida 

On  computation  of  transonic  projectile  aerodynamics 

Chen-chi  Hsu  and  Nae-Hauer  Shiau,  University  of  Florida 
Gainesville,  Florida 

Numerical  simulation  of  supersonic  flow  over  a  rotating 
band 

J.  Sahu,  Ballistics  Research  Laboratory,  Launch  and  Flight 
Division,  Aberdeen  Proving  Ground,  Maryland 

Improved  numerical  prediction  of  transonic  flow 

J.  Sahu  and  C.  J.  Nietubicz,  Ballistics  Research  Labora¬ 
tory,  Launch  and  Flight  Division,  Aberdeen  Proving  Ground, 
Maryland 

Break 


11:00  -  12:00 


12:00  -  13:30 
13:30  -  15:50 


13:30  -  13:50 


13:50  -  14:10 


14:10  -  14:30 


General  Session  II  -  WH  131 

Chairperson:  Dr.  Billy  Jenkins,  Army  Missile  Command, 
Redstone  Arsenal,  Alabama 

Numerical  Solution  of  Partial  Differential  Equations 
Richard  Ewing,  University  of  Wyoming,  Laramie,  Wyoming 
Lunch 

Technical  Session  WA1  -  Nonlinear  Analysis  and  Control  A 
-  WH  1-145 

Chairperson:  Dr.  William  Jackson,  Army  Tank  Automotive 

Comnand,  Warren,  Michigan 

Some  stability  results  for  advectlon-  diffusion  equations 

F.  A.  Howes,  Lawrence  Livermore  Laboratory,  Livermore, 
Call fornla 

Spatial  structure  of  time-periodic  solutions  of  the 
Glnzburg-Landau  equation 

Philip  Holmes,  Cornell  University,  Ithaca,  New  York 
Dissipation  In  conservative  systems 


xlx 
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14:30  -  14:50 


14:50  -  15:10 


15:10  -  15:30 


15:30  -  15:50 


15:50  -  16:15 


Mark  Levi,  Boston  University,  Boston,  Massachusetts 

An  analysis  of  the  Duffing1 s  equation  through  examination 
of  initial  condition  maps  and  Liapunov  exponents 

Charles  Pezeshki  and  Earl  Dowell,  Duke  University,  Durham, 
North  Carolina 

Poincare  maps  of  a  Journal  bearing 

P.  J.  Hollis  and  D,  L.  Taylor,  Cornell  University,  Ithaca, 
New  York 

The  propagation  of  information  and  uncertainty  in  dynamical 
systems 

Da'id  F.  Delchamps,  Cornell  University,  Ithaca,  New  York 
Extension  of  Sarkovskii's  theorem 

Walter  Egerland,  Ballistics  Research  Laboratory,  Aberdeen 
Proving  Ground,  Maryland 

Break 


13:30  -  15:50 


13:30  -  13:50 


13:50  -  14:10 


14:10  -  14:30 


14:30  -  14:50 


Technical  Session  -  WA2  -  Fluid  Mechanics  -  WH  1-101 

Chairperson:  Dr.  Miles  Miller,  Chemical  Research  and 

Development  Center,  Edgewood  Arsenal,  Maryland 

Fluid  Motion  In  Li qui d- fi  1 1  ed  shells 

T.  Herbert,  Virginia  Polytechnic  Institute  and  State  Univer¬ 
sity,  Blacksburg,  Virginia 

The  evolution  of  subharmonic  edge  wavepackets  on  a  sloping 
beach 

T.  R.  Akylas  and  S.  Knopping,  Massachusetts  Institute  of 
Technology,  Cambridge,  Massachusetts 

Theoretical  and  Simulation  studies  of  surfactants  at  liquid 
Interfaces 

J.  H.  Thurtell  and  K.  E.  Gubbins,  Cornell  University 
Ithaca,  New  York 

Static  capillary  bridges:  Global  stability  results  for 
symmetri zation  methods 

Paul  H,  Steen,  Cornell  University,  Ithaca,  New  York 


xx 


14:60  -  16:10 


15:10  -  15:30 


15:30  -  15:50 


15:50  -  16:15 


Rod  like  particles  in  second- order  fluid  under  simple  shear 

Bin  Chung ,  IBM,  San  Jose,  California  and  Claude  Cohen, 
Cornell  University,  Ithaca,  New  York 

A  unified  approach  to  mass  property  computations  in  a  solid 
model  environment  with  application  to  hydraulic  structures 

Fred  T.  Tracy,  Army  Corps  of  Engineers,  Waterways  Experi¬ 
mental  Station,  Vicksburg,  Mississippi 

The  Stokes  Limit  of  the  Flow  in  a  rotating  spinning  cylinder 

Raymond  Sedney,  Ballistics  Research  Laboratory,  Aberdeen 
Proving  Ground,  Maryland 

Break 


13:30  -  15:50 


13:30  -  13:50 


13:50  -  14:10 


14:10  -  14:30 

14:30  -  14:50 


14:50  -  15:10 

15:10  -  15:30 


Technical  Session  -  WA3  -  AI  and  Expert  Systems  -  WH  1-201 

Chairperson:  Dr.  Ralph  Harrison,  Army  Materials 

Technology  Laboratory,  Watertown, 
Massachusetts 

A  commonsense  theory  of  nonmonotonicity 

Frank  M.  Brown,  Artificial  Intelligence  Research  Institute 
Austin,  Texas 

Multi  objective  A* 

Bradley  Stewart  and  Chelsey  White  .University  of  Virginia, 
Charlottesville,  Virginia 

Mathematical  basis  for  expert  reasoning 

Forouzan  Golshani,  Arizona  State  University,  Tempe,  Arizona 

Toward  Optimal  Feature  Selection:  Past ,  Present,  and  Future 

W.  Siedlecki  and  J.  Sklansky,  University  of  California, 
Irvine,  California 

Introducing  treatments  Into  Test  Procedures 

0.  W.  Loveland,  Duke  University,  Durham,  North  Carolina 

On  the  errors  that  learning  machines  will  make 

A.  W.  Blermann,  K.  C.  Gilbert,  A.  Fahmy  and  B.  Koster  Duke 
University,  Durham,  North  Carolina 


xxl 
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15:30  -  15:50 


15:50  -  16:15 
16:15  -  17:15 


08:00  -  16:00 
08:30  -  10:30 


08:30  -  08:50 


08:50  -  09:10 


09:10  -  09:30 

09:30  -  09:50 


09:50  -  10:10 


A  model  of  decision-making  with  sequential  information- 
acquisition  with  application  to  the  file  search  problem 

James  C.  Moore  and  Andrew  B.  Whinston,  Purdue  University, 
West  Lafayette,  Indiana 

Break 


General  Session  III  -  WH  131 

Chairman:  Dr.  Gary  Anderson,  Army  Research  Office 
Durham,  North  Carolina 

Multiple  Bi  furcation 

John  Guckenheimer,  Cornell  University,  Ithaca,  New  York 


Thursday  May  29,  1986 
Registration  -  Warren  Hal  (WH) 

Technical  Session  -  THM1  -  Statistics  and  Data  Analysis  - 
WH  1-201 

Chairperson:  Major  Rickey  Kolb,  United  States  Military 
Academy,  West  Point,  New  York 

On  a  measure  of  block  design  efficiency  recovering  inter¬ 
block  information 

Walter  T.  Federer,  Cornell  University,  Ithaca,  New  York 

Computing  asymptotic  confidence  bands  for  Nonlinear  regres¬ 
sion  models 

John  J,  Peterson,  Syracuse  University,  Syracuse,  New  York 

Applying  statistical  graphics  to  multivariate  data 

Steven  J.  Schwagger,  Cornell  University,  Ithaca,  New  York 

Unimodular  dynamics  of  SF6  under  coherent  excitation 

John  C.  England,  Frederic  A.  Hopf  and  Charles  M.  Bowden,  U. 
S.  Army  Missile  Command,  Redstone  Arsenal,  Alabama 

Testing  Curve  Fit 

Royce  Soanes,  Benet  Weapons  Laboratory,  Watervllet  Arsenal 
Watervliet,  New  York 


xxli 


10:10  -  10:30  The  estimation  of  some  nptwork  parameters  in  the  pert  model 

of  activity  networks:  Review  and  critique 

Sal  ah  E.  Elmaghraby,  North  Carolina  State  University, 
Raleigh,  North  Carolina 

10:30  -  11:00  Break 


08:30  -  10:30 


08:30  -  08:50 


08:50  -  09:10 


09:10  -  09:30 


09:30  -  09:50 


09:30  -  10:10 


10:10  -  10:30 


10:30  -  11:00 


Technical  Session  -  THM2  -  Multiphase  Flow  -  WH  1-101 

Chairperson:  Dr.  Csaba  Zoltani ,  Ballistics  Research 
Laboratory,  Aberdeen  Proving  Ground,  Maryland 

On  the  two- phase  Stefan  problem  with  interfacial  energy  and 
entropy 

Morton  E.  Gurtin,  Carnegie-Mellon  University,  Pittsburgh, 
Pennsylvania 

Stefan's  Problem  in  a  Finite  Domain  with  constant  boundary 
and  Initial  conditions 

Shunsuke  Takagi ,  Cold  Region  Research  and  Engineering 
Laboratory,  Hanover,  New  Hampshire 

Thin  film  conductive  coating  for  surface  heating  and  decon¬ 
tamination 

S.  S.  Sadhal ,  University  of  Southern  California,  Los 
Angeles,  California,  P.  S.  Ayyaswamy,  University  of 
Pennsylvania,  Philadelphia,  Pennsylvania  and  Arthur  K. 
Stuempfle,  Chemical  Research  and  Development  Center,  Edge- 
wood  Arsenal,  Maryland 

The  Polseullle  Flow  of  a  particle- fluid  mixture  effective 
viscosity 

Donald  A.  Drew,  Rensselaer  Polytechnic  Institute,  Troy,  New 
York 

Fluids  In  Narrow  Pores:  Computer  Simulation  and  Mean  Field 
Theory 

B.  K.  Peterson  and  K.  E.  Gubblns,  Cornell  University  and 
J.  P.  R.  B.  Walton,  B.  P.  Research  Centre,  United  Kingdom 

a 

Macroscopic  and  microscopic  modelling  of  mushy  regions 
S.  D.  Howl  son,  Oxford  University,  United  Kingdom 
Break 


xxl  1 1 


08:30  -  10:30 


08:30  -  08:50 


08:50  -  09:10 


09:10  -  09:30 


09  :  30  -  09:50 


09:50  -  10:10 


10:10  -  10:30 


10:30  -  11:00 


Technical  Session  THM3  -  Optimization  -  WAH  1-145 

Chairperson:  Dr.  Ervin  Atzinger,  Army  Materiel  Systems 

Analysis  Activity,  Aberdeen  Proving  Ground, 
Maryland 

Global  Optimization  using  Automatic  Differentiation  and 
Interval  Iteration 

Louis  B.  Rail,  Mathematics  Research  Center,  University  of 
Wisconsin,  Madison,  Wisconsin 

Computing  disjoint  products  efficiently 

Michael  0.  Ball,  University  of  Maryland,  College  Park, 
Maryland  and  J.  Scott  Provan,  University  of  North  Carolina, 
Chapel  Hill,  North  Carolina 

Classes  of  Greedy  Matching  Heuristics 

M.  0.  Grigoriadis  and  B.  Kalantari,  Rutgers  University,  New 
Brunswick,  New  Jersey 

Optimization  of  stochastic  systems  via  Monte  Carlo  simula¬ 
tion 

Peter  W.  Glynn,  Mathematics  Research  Center,  University  of 
Wisconsin,  Madison,  Wisconsin 

Efficient  sequencing  of  a  four-circle  diffractometer 

Robert  G.  Bland  and  David  F.  Shallcross,  Cornell  Univer¬ 
sity,  Ithaca,  New  York 

Optimal  procedure  for  dynamic  programs  with  complex  loop 
structures 

A.  0.  Esogbue,  Georgia  Institute  of  Technology  and  C.  Y. 
Lee,  Institute  of  Technology,  Taejon,  Korea 

Break 


11:00  -  12:00  General  Session  IV  -  WH  131 

Chairman:  Dr.  Stephen  Wolff,  Ballistics  Research  Laboratory 
Aberdeen  Proving  Ground,  Maryland 

Stochastic  Differential  Forms 

Eugene  Wong,  University  of  California,  Berkeley 

12:00  -  13:30  Lunch 


xxiv 


■ 

]  3 :  3f:  -  15:50  Technical  Session  -  THAI  -  Nonlinear  Analysis  and  Cortrol  B 

-  WH  1-  14b 

Chairperson:  Dr.  Norman  Coleman,  Armament  Research  and 

Development  Command,  Dover,  New  Jersey 

13:30  -  13:50  Stochastic  filtering  and  control  with  wide  bandwidth  obser¬ 
vation  noi  se 

Harold  J.  Kushner,  Brown  University,  Providence,  Rhone 
Isl and 

13:50  -  14:10  Adaptive  Kalman  Filtering  for  Instrumentation  Radar 

Charles  K.  Chui  ,  Texas  A&M  University,  College  Station, 
Texas  and  Robert  E.  Green,  White  Sands  Missiles  Range,  New 
Mexico 

14:10  -  14:30  Optimal  impulse  correction  of  a  random  linear  operator 

P.  L.  Chow  and  J.  L.  Menaldi  ,  Wayne  State  University, 
Detroit ,  Michigan 

14:30  -  14:50  Pulse  arrival  times  for  waves  in  turbulent  media 

P.  L.  Chow  and  J.  L.  Menaldi,  Wayne  State  University, 
Detroit,  Michigan 

14:50  -  15:10  Efficient  Parallel  Algorithms  for  controllability  and 

eigenvalue  assignment  problems 

B.  N.  Datta  and  Karbi  Datta,  Northern  Illinois  University 
DeKalb,  Illinois 

15:10  -  15:30  The  transition  from  phase- locking  to  Drift  in  a  system  of 

Two  weakly  coupled  Van  der  Pol  Oscillators 

Tapesh  Chakraborti  and  Richard  H.  Rand  Cornell  University, 
Ithaca,  New  York 

15:30  -  15:50  Design  and  implementation  of  a  Multivariable  control  system 

for  Aircraft /Weapon  Applications 

Pak  T.  Yip  and  David  Ngo,  SMCAR-F SF-RC,  ARDC  Dover,  New 
Jersey 

15:50  -  16:15  Break 


13:30  -15:50  Technical  Session  THA2  -  NUMERICAL  PDE  -  WH  1-101 

Chairperson:  Dr.  Nisheeth  Patel,  Ballistics  Research 

Laboratory,  Aberdeen  Proving  Ground,  Maryland 


xxv 


13:30  - 

13:50  - 

14:10 

14:30 

14:50 

15:10 

15:30 

15:50 

13:30 


13:50  Domain  Contractions  around  three  dimensional  anamolies  in 
spherical  finite  difference  computations  of  Poission's 
Equation 

A.  H.  Zemanian  and  T.  S.  Zemanian,  State  University  of  New 
York  at  Stony  Brook,  Stony  Brook,  New  York 

14:10  Upwind  schemes  and  numerical  solutions  to  the  MHO  Riemann 
problem 

M.  Brio,  C.  C.  Wu,  S.  J.  Osher,  A.  Harten  University  of 
California,  Los  Angeles,  California 

14:30  Fi  nite-di fference  methods  for  polar  coordinate  systems 

John  C.  Strikwerda  and  Yvonne  Nagel  Mathematics  Research 
Center,  University  of  Wisconsin,  Madison,  Wisconsin 

14:50  Adaptive  Finite  Element  Methods  for  Parabolic  systems  in 
one-  and  two  -  space  dimensions 

Slimane  Adjerid,  Rensselaer  Polytechnic  Institute,  Troy  and 
Joseph  E.  Flaherty,  Rensselaer  Polytechnic  Institute  and 
Benet  keapons  Laboratory,  Watervliet  Arsenal,  Watervliet, 
New  York 

15:10  Fast  parallel  algorithms  via  domain  decomposition  for 
elliptic  problems 

J.  H.  Bramble,  Cornell  University,  Ithaca,  New  York,  J. 
Pasciac,  Brookhaven  National  Laboratory,  Upton,  New  York 
and  A.  H,  Schatz,  Cornell  University,  Ithaca,  New  York 

15:30  A  posteriori  error  estimation  in  a  finite  element  method 
for  parabolic  partial  differential  equation 

J.  M.  Coyle  and  J.  E.  Flaherty,  Benet  Weapons  Laboratory, 
Watervliet  Arsenal,  Watervliet,  New  York 

15:50  An  adaptive  method  with  mesh  moving  and  local  mesh  refine¬ 
ment  for  time  dependent  partial  differential  equations 

David  C,  Arney,  United  States  Military  Academy,  West  Point, 
New  York  and  J,  C.  Flaherty,  Rensselaer  Polytechnic  Insti¬ 
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ABSTRACT 

Hie  method  of  front  tracking  has  been  demonstrated  to  provide  high 
resolution  of  hydrodynamic  interfaces.  A  basic  motive  for  developing  this 
method  was  to  allow  a  study  of  the  transition  to  chaos  in  die  case  of  interface 
instability.  We  also  show  that  interactions  of  tracked  waves  and  bifurcations 
of  interface  topology  can  in  certain  cases  be  computed  automatically. 

Hiese  results  are  then  applied  to  the  study  of  jets  and  of  Angers  formed 
by  the  Rayleigh-Taylor  and  Meshkov  instabilities.  A  statistical  model  for  the 
chaotic  regime,  due  to  J.  A.  Wheeler  and  one  of  die  authors  (D.H.S.),  is 
presented,  and  its  relation  to  the  above  computations  is  outiined. 

We  also  discuss  modifications  of  the  front  tracking  method  due  to  gravi* 
tational  and  geometrical  source  terms  in  the  Euler  equations,  and  work  in 
progress  concerning  use  of  equations  of  state  for  real  materials. 
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1.  Introduction 

Proof  of  scientific  principle  for  the  front  tracking  method  has  been  established  in  a 
series  of  papers  by  the  authors  and  coworkers  [1,2,3, 4,5].  This  method  haa  been  adapted 
to  a  variety  of  problem  areas,  including  oil  reservoirs,  shock  tuba  experiments,  astrophysics, 
and  detonation  waves.  Recent  improvements  of  this  method  allow  consideration  of  more 
complex  and  interesting  problems.  Here  we  consider  the  interaction  between  tracked  waves, 
and  interface  instabilities  in  the  compressible  regime  (including  the  Meshkov,  Rayleigh* 
Taylor,  and  supersonic  jet  instabilities).  We  also  discuss  modifications  of  the  front  tracking 
method  due  to  gravitational  and  geometrical  source  terms  in  the  Euler  equations,  and  report 
on  work  in  progress  to  allow  for  the  effects  of  real  equations  of  state. 


2.  A  Description  of  the  Front  Tracking  Method 

Front  tracking  is  an  adaptive  computational  method  for  solving  a  hyperbolic  system  of 
nonlinear  conservation  laws.  In  two-dimensional  problems,  a  moving  one- dimensional  grid, 
called  the  front,  is  fitted  to  and  tracks  selected  waves  in  the  solution.  These  waves  can  be 
sharp  discontinuities  which  exist  as  mathematical  solutions  of  idealized  physical  equations 
(e.g.  the  Euler  equations)  or  waves  for  which  physical  quantities  change  rapidly  but  smoothly 
over  a  fraction  of  a  mesh  length  (e.g.  chemical  reaction  fronts).  For  compressible  fluid 
dynamics,  these  waves  include  shock  waves,  contact  discontinuities,  material  interfaces,  phase 
boundaries,  slip  lines,  and  chemical  reaction  fronts. 

The  front  tracking  code  employs  a  finite  difference  method  together  with  the  tracking  of 
selected  waves  to  solve  the  two-dimensional  Euler  equations  of  compressible  gas  dynamics  in 
conservation  form.  The  Euler  equations  for  a  compressible,  inviscid  gas  can  be  cast  in  the 
form  of  a  general  hyperbolic  system  of  nonlinear  conservation  laws 


wt  +  V-f(w)  «  0 


(2.1) 


by  setting 


w »  m  and  f(w) 


mgm 


+p 


(E+p)J 


(2.2) 


p  is  the  mass  density,  m  =  pu  is  the  momentum  density,  £  is  the  total  energy  density,  and  p 

is  the  thermodynamic  pressure.  The  total  energy  can  be  written  as  E  «*  p«  +  -Wi  where  * 

is  the  specific  internal  energy.  The  pressure,  density  and  specific  internal  energy  are  related 
by  a  caloric  equation  of  state;  thus  only  two  of  the  three  quantities  are  independent.  Eqs. 
(2.1)  and  (2.2)  express  the  conservation  of  mass,  momentum,  and  energy. 

The  front  divides  the  computational  grid  into  topologically  connected  interior  regions 
called  components.  The  solution  is  computed  by  first  propagating  the  front  and  then  the  solu¬ 
tion  in  each  component. 


The  front  is  advanced  in  two  steps.  First  the  Ranldne-Hugoniot  equations  are  used  to 
propagate  the  front  normally  by  solving  a  nonlocal  Riemann  problem.  Then  tangential  waves 
are  propagated  along  the  front  using  a  one-dimensional  Lax-Wendroff  method.  At  points 
(called  nodes)  where  discontinuity  curves  intersect,  the  propagation  is  defined  by  the  solution 
of  shock  polar  equations,  as  a  first  approximation  to  solving  a  two-dimensional  Riemann 
problem.  The  propagation  of  the  front  for  the  Euler  equations  without  source  terms  is 
described  more  thoroughly  in  Ref.  [4],  while  front  tracking  and  two-dimensional  Riemann 
problems  are  Ifscussed  in  Ref.  [6]. 


The  interior  regions  between  fronts  are  treated  as  initial-boundary  value  problems  and 
the  solutions  in  these  regions  are  computed  using  an  operator  split  Lax-Wendroff  finite- 
difference  method.  The  front  and  interior  schemes  are  coupled  in  a  strip  of  width  one  mesh 
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spacing  on  either  side  of  the  front. 

3.  Shock  and  Contact  Ware  Interactions 

We  distinguish  between  scalar  and  vector  waves.  The  scalar  waves  such  as  contacts, 
material  interfaces,  phase  boundaries  and  concentration  waves  do  not  produce  reflected 
waves  on  interaction  whereas  the  vector  waves  such  ns  shock  waves  in  gas  dynamics  do.  A 
fairly  general  algorithm  for  resolving  the  interaction  of  scalar  waves  was  presented  in  Ref. 
[7],  in  the  context  of  tracked  saturation  fronts  in  oil  reservoirs.  We  show  the  type  of  com* 
plex  interface  that  can  develop  from  a  simple  one  in  Ftg.  3.1.  We  are  also  interested  in  the 
interaction  of  (vector)  shock  waves  with  contact  waves  and  with  each  other.  As  a  model 
problem  for  the  study  of  this  interaction,  we  consider  a  simplified  version  of  the  Meshkov 
instability,  in  which  a  shock  wave  hits  a  contact  having  a  small  (sine  wave)  perturbation  from 
planar.  After  passage  of  the  shock  wave,  this  perturbation  grows  at  first  exponentially  and 
then  linearly  in  time  before  coming  to  rest.  The  late  time  behavior  is  discussed  in  the  next 
section.  Here  we  describe  the  sequence  of  shock  interaction  problems  that  take  place  in  the 
initiation  process. 

In  Fig.  3.2  we  show  a  sequence  of  shock  and  contact  fronts  for  a  shock  wave  hitting  an 
interface  between  warm  and  cold  air,  while  in  Fig.  3.3  we  show  a  similar  sequence  where  the 
interface  separates  air  and  SF6  at  the  same  temperature.  In  both  cases  the  shock  is  incident 
in  the  lighter  gas  (the  warmer  air  in  Fig.  3.2  and  the  air  in  Fig.  3.3).  Each  simulation  begins 
shortly  before  the  shock  wave  collides  with  the  contact  discontinuity  surface.  When  the 
shock  wave  reaches  this  surface  it  is  transmitted  through  and  reflected  by  the  contact.  The 
contact  discontinuity  is  in  turn  deflected  by  this  interaction.  We  observe  diffracted  wave  pat¬ 
terns  propagating  away  from  the  original  point  of  collision  as  the  shock  continues  to  pro¬ 
pagate  into  the  gas  interface.  Eventually  the  shock  wave  will  pass  completely  through  the 
contact  discontinuity,  and  the  reflected  and  transmitted  waves  will  propagate  away  from  each 
other  on  opposite  sides  of  the  gas  interface.  In  general  this  will  produce  complicated  wave 
interactions,  but  in  our  model  we  only  track  the  transmitted  shock  and  the  reflected  wave  (if 
it  is  a  shock).  This  approximation  assumes  the  other  waves  produced  by  this  interaction  are 
weak  enough  not  to  require  tracking. 

The  front  tracking  code  is  well  suited  for  the  propagation  of  interior  points  on  tracked 
curves,  but  must  be  extended  to  handle  the  complicated  wave  patterns  that  occur  when  two  or 
more  waves  interact  at  a  single  point.  In  the  shock-contact  interaction  each  of  the  diffraction 
patterns  consists  of  an  incident  shock  colliding  with  a  contact  discontinuity  producing 
reflected  and  transmitted  waves.  Such  a  configuration  will  be  called  a  diffraction  node.  The 
analysis  of  the  interaction  between  a  planar  shock  wave  and  a  planar  contact  discontinuity  has 
been  discussed  in  detail  in  Refs.  [8,9,10,11,6]  and  here  we  will  only  summarize  these  results 
as  they  are  applied  in  the  front  tracking  code.  In  a  neighborhood  of  a  diffraction  node  we 
ignore  any  curvature  and  replace  the  two  colliding  curves  by  their  tangents.  We  next  assume 
that  there  exists  a  reference  frame  in  which  the  flow  near  the  point  of  interaction  is  steady. 
Finally  we  restrict  our  attention  to  the  so  called  regular  reflection  case  in  which  the  interac¬ 
tion  occurs  at  a  single  point,  the  transmitted  wave  is  a  shock  and  the  reflected  wave  is  either 
a  shock  or  a  centered  rarefaction  wave.  (More  complicated  configurations  include  Mach  and 
multiple  Mach  type  reflections.)  This  assumption  is  valid  provided  the  angle  between  the 
incident  shock  and  the  contact  discontinuity  is  sufficiently  small.  Since  flow  does  not  cross  a 
contact  discontinuity,  the  stream  lines  on  opposite  sides  of  the  interface  must  be  parallel. 
This  means  that  the  flow  through  the  incident  shock  and  the  reflected  wave  must  be  turned 
by  the  same  amount  as  the  flow  through  the  transmitted  shock.  If  we  assume  that  the  states 
of  the  gas  on  both  sides  of  the  contact  discontinuity  ahead  of  the  incident  shock  are  known, 
together  with  the  strength  of  the  incident  shock  (say  the  pressure  jump  across  the  shock), 
then  the  Ranldne-Hugoniot  conditions  together  with  this  restriction  provide  a  system  of  alge¬ 
braic  equations  from  which  the  pressure  behind  the  reflected  and  transmitted  waves  can  be 
found  (this  solution  may  be  multi-valued).  This  pressure  can  then  be  used  to  construct  the 
states  behind  the  transmitted  shock  and  reflected  wave  along  with  the  angles  at  which  these 
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waves  meet  the  point  of  shock  diffraction. 

Since  we  are  dealing  with  curved  waves,  this  calculation  is  performed  at  each  times tep. 
The  transformation  to  the  steady  frame  of  an  individual  diffraction  node  is  found  by  a 
geometric  construction.  The  incident  shock  and  the  ahead  contact  discontinuity  are  first  pro* 
pagated  separately,  ignoring  any  interaction  between  the  two  waves.  The  intersection 
between  the  two  propagated  curves  is  found  and  this  is  used  as  the  updated  node  position, 
from  which  a  node  velocity  is  computed.  This  velocity  defines  the  transformation  to  the 
steady  frame  of  the  node.  New  states  and  wave  angles  about  the  diffraction  node  are  com* 
puted  and  inserted  into  the  tracked  wave  structures. 

The  geometrical  construction  of  the  node  velocity  is  also  important  since  it  provides  a 
method  of  detecting  wave  interactions.  When  the  original  shock  passes  through  the  contact, 
the  ahead  curves  will  both  be  short  segments  that  will  propagate  put  one  another  in  the  finite 
time  A t.  The  propagated  curves  do  not  intersect  and  hence  a  node  velocity  cannot  be  com¬ 
puted.  At  this  point  control  is  shifted  to  routines  designed  to  identify  and  handle  such 
interactions. 

4.  Interface  Instabilities 

We  have  studied  a  series  of  related  problems,  each  of  which  leads  to  fingering  instabili¬ 
ties  or  jets,  with  the  penetration  of  a  heavy  material  into  a  lighter  ambient  material.  Followed 
to  late  time,  this  leads  to  a  chaotic  mixing  regime  discussed  in  the  next  section.  The  series  of 
problems  arise  from  different  procedures  to  initiate  this  instability,  as  an  accelerated  surface 
[12],  supersonic  jet  [13],  shock-contact  collision  [14],  or  Rayleigh-Taylor  instability  [15]. 
We  have  considered  a  range  of  density  ratios  up  to  100:1  and  accelerating  forces,  which  for 
the  Rayleigh-Taylor  problem  are  in  the  range  of  up  to  106  to  109  g  depending  on  the  length 
scale  of  the  perturbation  considered.  The  Mach  numbers  considered  spanned  a  range  of  from 
0.1  to  6. 

The  compressible  Rayleigh-Taylor  problem  depends  on  three  dimensionless  parameters: 
Pa 

the  density  ratio  D  -  — ,  where  pb  is  the  density  of  the  heavy  gu  just  below  the  interface 

Pa 

(we  assume  gravity  points  up)  and  pa  is  the  density  of  the  light  gu  just  above  the  interface; 
the  polytropic  gu  constant  y  (here  we  set  ya  **  yb  =  1.4)  or  other  information  to  set  the 
equation  of  state  for  the  heavy  and  light  fluids;  and  a  Mach  number  M  defining  the  ratio  of  a 
gravitational  time  scale  to  a  sound  speed  time  scale.  M  defines  a  dimensionless  compressibil¬ 
ity.  We  take  M 2  =  where  X  is  the  wavelength  of  the  interface  perturbation  and  cb  is  the 

ck 

sound  speed  in  the  unperturbed  heavy  fluid.  In  Fig.  4.1  we  show  a  sequence  of  interface 
positions  for  a  compressible  heavy  gu  falling  into  a  lighter  gu  with  D  **  2  and  M2  =  0.5.  In 
this  case  the  terminal  Mach  number  of  the  bubble  and  spike  is  about  0.2.  In  Fig.  4.2  we  show 
the  cue  of  four  symmetric  bubbles  and  spikes,  for  D  *  10  and  M2  =  0.89.  In  Fig.  4.3  we 
show  a  similar  sequence  for  D  =  10  and  M2  =  0.89,  in  which  there  is  a  capture  of  die 
smaller  side  bubbles  by  the  larger  central  one.  (For  an  interface  with  multiple  modes,  we 
give  the  maximum  value  of  M2.)  We  refer  to  the  cases  of  single  bubble  dynamics  and  of 
bubble  capture  u  the  one  and  two  body  problems  of  bubble  dynamics;  they  are  central  to  the 
statistical  model  for  the  mixing  regime  discussed  in  the  next  section. 

Computations  of  supersonic  jets  by  Norman,  Smarr,  and  Winkler  (NSW)  [16]  have  gen¬ 
erated  a  great  deal  of  interest,  due  to  their  qualitative  agreement  with  observations  and  their 
quantitative  predictions.  Since  the  radio  telescope  observations  will  become  more  detailed  in 
the  near  future,  it  is  of  great  interest  to  compare  computations  of  supersonic  jets  by  different 
methods.  To  this  purpose,  our  computations  using  a  "surface"  front  tracking  method  may  be 
contrasted  with  the  results  obtained  by  NSW  using  a  "volume"  front  tracking  method.  We 
find  overall  agreement  in  the  wave  structure  of  the  computations,  but  find  a  marked  differ¬ 
ence  in  the  details  of  die  contact  boundary  between  the  jet  and  ambient  gases.  We  believe 
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our  method  offers  a  higher  degree  of  resolution  of  the  tracked  contact,  since  our  method 
tracks  it  as  a  sharp  discontinuity  rather  than  as  a  "smeared  out"  interface,  and  preserves  the 
integrity  of  the  tracked  front  from  step  to  step. 

Fig.  4.4  displays  the  evolution  to  late  time  of  a  cylindrical^  symmetric  Mach  3  Jet.  The 
density  ratio  of  jet  gas  to  ambient  gas  is  10:1.  ?  was  set  equal  to  3/3.  Note  die  presence  of  a 
bow  wave  in  front  of  the  jet  and  of  a  terminal  shock  near  die  bead  of  the  jet  beam,  preceded 
by  a  rarefaction  wave.  Ihis  terminal  shock  system  may  explain  the  observed  hot  spots  ter* 
minating  astrophysical  jets.  The  contact  shape  displays  large-scale  Kelvin- Helmholtz  rollup, 
and  the  development  of  two-dimensional  pinch  waves. 

5.  The  Mixing  Regime 

The  late  stages  of  a  Rayleigh-Taylor  unstable  interface  lead  to  a  chaotic  mixing  regime. 
The  portion  of  the  mixing  layer  adjacent  to  the  heavy  fluid  is  dominated  by  the  mechanism  of 
bubble  merger  or  amalgamation.  A  model  for  bubble  merger  due  to  J.  A.  Wheeler  and  one 
of  the  authors  (D.H.S.)  [17]  (a  brief  description  is  also  contained  in  [18] )  has  been  analyzed 
numerically.  In  the  model,  it  is  assumed  that  the  interface  is  piecewise  constant  and  single 
valued,  so  that  the  bubbles  are  the  piecewise  constant  intervals  in  the  interface.  A  simple 

scaling  argument  shows  that  the  bubble  velocity  is  i  «  const  (gr) 2  where  r  is  the  bubble 
radius.  The  constant  is  a  function  of  the  dimension! eu  parameters  of  the  problem  and  can  be 
determined  numerically  by  the  solution  of  the  one  body  problem  as  discussed  in  the  previous 
section.  When  a  large  bubble  moves  sufficiently  far  ahead  of  a  smaller  bubble,  the  two  are 
forced  to  merge,  with  a  new  height  set  by  conservation  of  mass.  The  merger  height  is  then 
determined  numerically  by  a  solution  of  the  two  body  problem  a3  discussed  in  the  previous 
section.  In  Fig.  3.1  we  show  a  sequence  of  successive  sample  interfaces  generated  by  the 
numerical  solution  of  this  model,  and  in  Fig.  3.2  we  plot  the  average  bubble  velocity  as  a 
function  of  time,  for  a  specific  choice  of  initial  data  consisting  of  a  Gaussian  distribution 
about  a  uniform  bubble  size.  One  can  see  clearly  the  trend  toward  merger  of  bubbles  and 
the  growth  of  larger  bubbles  at  the  expense  of  the  smaller  ones. 


6.  Front  Tracking  with  Source  Terms 

Gravitation  and  cylindrical  symmetry  introduce  source  terms  into  the  conservation  form 
of  the  Euler  equations.  In  this  section,  we  discuss  the  modifications  necessary  in  applying  the 
front  tracking  method  to  problems  with  gravity  or  cylindrical  symmetry. 

With  a  gravitational  force,  Eq.  (2.1)  is  modified  by  source  terms: 

Wj  +  V-f(w)  *  S(w)  (6.1) 

where 


S  - 


(6.2) 


In  this  case,  E  stands  for  the  internal  plus  kinetic  energy  density.  The  gravitational  potential 
energy  density  has  been  shifted  from  the  left-hand  side  of  Eq.  (2.1)  and  appears  as  m  g  in  S  . 


The  cylindrically  symmetric  Euler  equations  can  be  written  in  the  form  (6.1)  with 


P 

P“r 

P“l 

E 


provided  it  is  understood  that  V  f  is  to  be  interpreted  with  a  flat  metric  in  (r,  x)  coordinates 
as  dj,  +  d,t, ,  where 
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With  this  interpretation, 


S  - 
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P 

P«r 

P“i 


{ B+P) 


(6.3) 


The  interior  solver  and  tangential  sweep  with  source  terms  are  modified  only  by  includ¬ 
ing  S  in  the  finite  difference  equedons.  The  Lax-Wendroff  method  remains  second-order 
accurate  even  with  the  source  term  S. 

For  the  normal  and  tangential  sweeps  of  the  front,  the  Euler  Eqs.  (6.1)  are  split  into 
normal  and  tangential  parts: 

w,  +  i[(i-V)f(w)J  +  l[(IV)f(w)]  =  S„  +  S„  (6.4) 

where  for  gravity 


S«- 

0 
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Pin 
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Pfi 

[■«J 

and  for  cylindrical  symmetry 
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and  S.  =  u  l 

P 

P“r 

r 

P“; 

1  r 
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E+P, 

The  splitting  method  is  first  to  solve  the  normal  equations 

w,  +  »  ((»  V)f(w)]  -  S„,  (6.5) 

and  then  the  tangential  equations 

w,  +  l  [(I  V)f(w)j  -  Si.  (6.6) 

Eq.  (6.6)  is  solved  by  a  one-dimensional  Lax-Wendroff  method.  The  normal  sweep  is 
further  modified  by  an  operator  splitting  method.  For  tracked  shocks,  the  solution  to  Eq. 
(6.5)  is  found  by  solving  a  nonlocal  Riemann  problem  [4]  for  the  homogeneous  equation 

wt  +  6  [(6  V)f(w)]  »  0, 

and  then  the  corrections  are  added  by  solving 

"f 83 

For  through-flow  boundaries,  Eq.  (6.5)  is  solved  by  a  one-dimensional  Lax-Wendroff 
method. 

For  contacts  and  wall  boundaries  the  solution  of  the  nonlocal  Riemann  problem  in  the 
normal  direction  is  modified  to  include  the  effects  of  source  terms  in  the  characteristic  equa¬ 
tions.  If  8  *  0  the  states  at  a  contact  or  wall  boundary  may  be  updated  by  solving  the 
characteristic  equations 

dp±pcdu  -  0  (6.7) 

for  characteristic  wave  speeds  u±c.  With  a  non-zero  S,  Eq.  (6.7)  becomes 

dp±pcdu  -  S„dt,  (6.8) 

where  S„  =  ±pcf„  for  gravity,  and  S„  =  --—ape2  for  cylindrical  symmetry.  The  finite 
difference  form  of  Eq.  (6.8)  is 


6 


P~Po±  (*(•*- *o)  “  S„dt,  (6.9) 

where  the  unsubscripted  variables  indicate  the  quantities  at  the  head  of  the  characteristic  (at 
time  t+dt),  the  subscript  0  indicates  the  quantities  at  the  foot  of  the  characteristic  (at  time  t), 

andu-Aa. 

The  node  propagation  algorithms  are  modified  through  Eqs.  (6.4)  and  (6.6);  the 
Rankine-Hugoniot  jump  relations  are  unchanged. 


7.  Real  Equations  of  Stata 

Much  work  has  recently  been  devoted  to  die  problem  of  implementing  realistic  equa¬ 
tions  of  state  for  gas  dynamical  calculations.  Commonly,  scientific  studies  assume  a  polytro¬ 
pic  or  gamma  law  gas  equation  of  state;  our  goal  is  to  extend  our  front  tracking  hydrodynam¬ 
ics  code  to  handle  more  general  equations  of  state.  Over  the  past  year  a  considerable  effort 
was  made  to  isolate  and  modularize  the  equation  of  state  dependences  in  our  gas  dynamics 
simulation  program.  This  work  has  now  been  completed  and  the  equation  of  state  depen¬ 
dences  have  been  isolated  to  a  relatively  small  number  of  subprograms  such  as  the  calculation 
of  pressures  from  densities  and  energies  or  the  calculation  of  sound  speeds.  Furthermore 
these  subprograms  have  been  written  in  such  a  way  that  the  user  may  "plug  in"  additional 
equations  of  state  as  they  are  developed.  We  are  now  in  the  processes  of  adding  two  addi¬ 
tional  equations  of  state  to  our  gas  code  in  addition  to  the  currently  supported  polytropic 
equation  of  state.  These  are  the  so  called  stiffened  polytropic  equation  of  state  and  the  Los 
Alamos  National  Laboratory  table  look  up  equation  of  state  SESAME. 


An  equation  of  state  is  a  functional  relation  between  the  thermodynamic  variables  that 
describe  the  state  of  a  gas.  These  variables  indude  the  density,  pressure,  temperature, 
specific  internal  energy  and  the  specific  entropy  of  the  gas.  Only  two  of  these  variables  can 
be  independent  and  the  equation  of  state  describes  the  remaining  quantities  when  any  two  are 
given.  For  example  in  the  polytropic  equation  of  state  the  specific  internal  energy  e  is  given 


by  e 


(7  t  Dp 

isii 


where  p  and  p  are  the  pressure  and  density  of  the  gas  respectively  and  y  is 


a  dimensionless  constant  greater  than  one.  The  temperature  T  of  a  polytropic  gas  is  given  by 
the  ideal  gas  law  RT  -  •£•  where  J?  is  a  positive  constant.  The  stiffened  polytropic  equation 

of  state  is  a  generalization  of  the  polytropic  equation  of  state,  where  e  =  and 

PT  m  p  +  Po .  As  in  the  polytropic  model  R  >  0  and  y  >  1  are  constants.  The  additional 

constant  p0  as  0  has  the  dimensions  of  pressure.  If  p0  =  0  the  stiffened  polytropic  model 
reduces  to  the  polytropic  case.  Stiffened  polytropic  equations  of  state  have  been  used  to 
model  metals.  For  instance,  tungsten  may  be  modeled  with  y  ~  3.2  and  p0  ~  1  Mbar  over 
a  range  of  pressures  from  zero  to  seven  Mbor. 


Both  the  polytropic  and  the  more  general  stiffened  polytropic  equations  of  state  are 
examples  of  simple  analytic  equations  of  state.  Their  implementation  into  a  hydrodynamics 
code  is  relatively  simple  and  involves  the  calculation  of  various  quantities  such  as  the  sound 
speed  and  shock  Hugoniots.  Because  of  the  simple  nature  of  these  models  it  is  possible  to 
find  explicit  formulas  for  these  quantities  which  allow  for  quick  and  accurate  numerical  calcu¬ 
lations.  Their  main  limitations  are  that  real  materials  only  approximately  satisfy  them  over  a 
limited  range  of  temperatures  and  pressures,  and  they  do  not  include  mechanisms  for  phase 
transitions.  The  Los  Alamos  National  Laboratory  program  SESAME  is  an  attempt  to  over¬ 
come  these  problems  by  using  a  tabular  equation  of  state.  Here  we  are  given  a  rectangular 
grid  of  densities  and  temperatures  with  the  pressure  and  specific  internal  energy  given  at  each 
grid  point  (p,  7).  Pressures  and  energies  at  intermediate  densities  and  temperatures  can  then 
be  found  by  interpolation. 


One  advantage  of  such  a  program  is  that  it  allows  one  to  support  a  large  number  of 
materials  using  the  same  basic  software.  In  addition  the  table  for  an  individual  material  may 
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be  built  by  combining  several  different  analytic  models  each  with  its  own  range  of  validity,  or 
by  using  directly  measured  experimental  information.  However  such  generality  and  flexibil¬ 
ity  exact  a  cost  for  a  hydrodynamics  code.  Quantities  which  reduce  to  simple  formulas  for 
the  polytropic  model  must  now  be  found  by  solving  systems  of  nonlinear  equations  or  dif¬ 
ferential  equations  numerically.  In  particular  the  calculation  of  shock  Hugoniots  and  adia¬ 
batic  (constant  entropy)  curves  can  become  extremely  expensive.  Since  in  any  code  which 
involves  the  solution  of  Riemann  problems  (such  as  our  front  tracking  code)  these  quantities 
must  be  computed  hundreds  or  even  thousands  of  times  each  timestep,  it  is  easy  to  see  that 
numerical  simulations  can  be  impractical  on  even  the  most  advanced  machines. 

We  are  now  in  the  process  of  developing  an  implementation  of  the  SESAME  program 
into  our  gas  dynamics  code  which  will  address  these  inefficiency  problems  by  precomputing 
as  much  as  possible  the  quantities  which  are  used  repeatedly  in  the  solution  of  Riemann  prob¬ 
lems.  The  original  SESAME  program  already  included  a  facility  for  inverting  the  given 
tables  into  a  format  in  which  the  density  and  specific  internal  energy  were  the  independent 
variables.  To  this  wc  are  adding  inverted  forms  with  pressure  and  density  or  pressure  and 
specific  entropy  as  independent  variables.  In  addition  it  is  possible  to  precompute  various 
integrals  which  occur  in  the  solution  of  the  Riemann  problem  and  include  them  as  data  in  the 
table.  Our  hope  is  that  by  applying  these  principles  we  will  be  able  to  achieve  rates  of  solu¬ 
tion  to  Riemann  problems  which  are  comparable  to  those  obtained  for  pplytropic  or  other 
similar  equations  of  state. 
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(a)  Time  step  0 
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Fig.  3.1  Plots  of  the  oil-water  interfaces  for  a  well  configuration  consisting  of  19 
injecting  wells  (crossed  "juares)  and  12  producing  wells  (open  squares). 


(a)  time  0 


(c)  time  0.5 
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(d)  time  5 


Fig.  3.2  A  abode  kitting  a  contact  discontinuity  separating  two  masse*  of  air  at  different 
temperature.  The  pressure  ratio  across  the  shock  is  1000  and  the  density  ratio  across  the 
contact  discontinuity  is  approximately  2.86.  The  shock  is  incident  in  the  lighter  gas. 
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Fig.  3.3  A  shock  hitting  a  contact  discontinuity  separating  air  from  the  gas  5F6.  The 
contact  discontinuity  curve  is  given  an  initial  shape  of  a  sine  curve.  The  shock  is 
incident  from  the  air  and  has  a  pressure  ratio  of  10.  The  boxed  region  in  Fig  3.3b  is 
blown  up  in  the  next  figure. 


time  0.02 


Fig.  3.4  A  blowup  of  •  subregion  of  Fig  3.3b  showing  the  incident  shock  colliding  with 
the  ahead  contact  discontinuity,  producing  reflected  and  transmitted  shocks. 
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Fig.  4.1  A  sequence  of  interface  positions  for  •  compressible  heavy  gas  (below)  falling 
into  a  lighter  gas  (above)  with  a  density  ratio  of  2:1;  in  this  case  the  terminal  Mach 
number  of  die  bubble  and  spike  is  about  0.2.  Gravity  paints  upward. 


10  Ax  -  10  Ay 


Hi.  4.2  Demity  contour,  for  the  IU)rlel*h-T.ylor  imubiUt,  (or  the  c«e  of  four 
symmetric  bubbles  and  spikes  with  a  density  ratio  of  10:1. 
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(•)  time  0.7  interface  plot  (b)  time  0.7  pressure  contours 


Fig.  4.4  Plots  of  •  cylindrical])^  symmetric  Mach  3  jet.  The  density  ratio  of  jet  gas  to 
ambient  gas  is  10:1. 
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Fig.  5.1  A  sequence  of  successive  sample  interfaces  generated  by  the  numerical  solution 
of  a  bubble  growth  model. 


NONLINEAR  VISCOELASTIC  MATERIALS  WITH  FADING  MEMORY 
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Madison,  WI  53705 

Abstract*  The  equations  governing  the  motion  of  viscoelastic  materials 
with  fading  memory  incorporate  a  nonlinear  elastic-type  response  with  a 
natural  dissipative  mechanism.  Our  purpose  is  to  discuss  the  subtle  effects 
of  this  mechanism  in  viscoelastic  materials  of  Boltzmann  type.  Recent  results 
on  the  global  existence  and  decay  of  classical  solutions  for  smooth  and  small 
data  (in  one  space  dimension)  are  reviewed  for  smooth  and  singular  memory 
kernels;  for  smooth  kernels  a  number  of  such  results  can  be  generalized  to 
several  space  dimensions.  A  recent  result  on  the  development  of  singularities 
in  finite  time  for  large  data  is  discussed;  several  open  problems  are 
formulated.  A  program  for  a  studying  weak  solutions  for  such  systems, 
including  the  development  of  numerical  algorithms,  is  outlined. 

1.  Introduction.  The  equations  governing  the  motion  of  nonlinear 
elastic  bodies  are  quasilinear  hyperbolic  systems  for  which  smooth  solutions 
generally  lose  regularity  in  finite  time  due  to  the  formation  of  shock  fronts. 
Some  materials  incorporate  a  nonlinear  elastic-type  response  with  a  natural 
dissipative  mechanism,  and  it  is  important  to  understand  the  effects  of  the 
dissipation  on  the  behaviour  of  the  solutions  of  the  equations  of  motion. 

The  purpose  of  this  lecture  is  to  discuss  the  effects  of  the  subtle 
dissipative  mechanism  due  to  memory  effects  in  viscoelastic  materials  of 
Boltzmann  type.  This  dissipation  is  more  delicate  than  that  exhibited  by 
viscoelastic  materials  of  the  rate  type  for  which  globally  defined  smooth 
solutions  exist,  even  for  large  smooth  data. 

The  paper  is  organized  as  follows.  In  Section  2  we  formulate 
mathematical  models  for  the  motion  of  nonlinear  viscoelastic  materials  and  we 
motivate  the  mathematical  theory*  In  Section  3  we  survey  recent  results  on 
the  global  existence  of  smooth  solutions  for  smooth  and  small  data.  In 
Section  4  we  present  a  recent  result  on  the  breakdown  of  smooth  solutions  for 
large,  smooth  data  and  discuss  briefly  related  open  questions  including  those 
regarding  weak  solutions  and  numerical  methods  (Remarks  4.8).  We  restrict  our 
attention  throughout  to  one-dimensional  problems  and  provide  some  references 
for  multidimensional  problems.  Moreover,  we  consider  only  a  purely  mechanical 
theory,  i.e.  we  neglect  thermal  effects. 


2.  Mathematical  Models  and  Dynamic  Problems.  Consider  the  longitudinal 
motion  of  a  homogeneous  one-dimensional  body  (e.g.  a  bar  of  uniform  cross- 
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section)  occupying  an  Interval  B  in  a  reference  configuration,  which  we 
assume  to  be  an  equilibrium  state,  and  having  unit  reference  density.  B  may 
be  bounded  or  unbounded.  Let  u(x,t)  denote  the  displacement  at  time  t  of 
a  particle  with  reference  position  x  (i.e.  x  +  u(x,t)  is  the  position  at 
time  t  of  the  particle  at  x).  The  strain  which  measures  local  stretching 
is  defined  by  e  :*•  ux(x,t).  Let  a  denote  the  stress  at  time  t  of  the 
particle  with  reference  position  x  (a  measures  the  contact  force  per  unit 
area).  The  balance  of  linear  momentum  yields  the  equation  of  motion 

ufct  *  ox  +  f  ,  x  €  B,  t  >  0  ,  (2.1) 

where  subscripts  denote  partial  derivatives  and  where  f  is  an  external  body 
force.  In  order  to  characterize  the  material,  (2.1)  is  supplemented  by  a 
constitutive  assumption  which  relates  the  stress  to  the  motion.  In  addition, 
initial  data,  as  well  as  suitable  boundary  data  if  B  is  not  R,  are  adjoined 
to  (2.1).  We  remark  that  in  a  physical  problem  the  cross-section  does  not 
generally  remain  uniform  as  the  bar  is  stretched.  More  realistic  problems  can 
be  treated  by  similar  techniques. 

If  the  body  is  homogeneous  and  purely  elastic,  the  stress  depends  on  the 
strain  through  the  constitutive  relation  a(x,t)  «  $(e(x,t)),  where  $  is  a 
given  smooth  function  satisfying  the  assumptions  (i)  <M 0 )  *  0,  (ii)  4>'(0)  >  0; 
(i)  reflects  the  fact  that  the  reference  position  is  taken  as  an  equilibrium 
state,  and  (ii)  that  the  stress  increases  with  the  strain,  at  least  near 
equilibrium.  The  equation  of  motion  (2.1)  becomes  the  familiar,  one- 
dimensional,  quasilinear  wave  equation 

utt  -  <Mux)x  +  f  (x  e  B,  t  >  0)  »  (2.2) 

if  B  is  bounded  it  is  assumed  that  the  assigned  boundary  data  and  initial 
data  are  compatible.  For  (2.2)  there  is  no  natural  dissipative  mechanism. 
Indeed,  Lax  [331,  also  MacCamy  and  Mizel  [37]  and  Kleinerman  and  Majda  [31] 
have  shown  that  if  $  is  not  linear,  the  Cauchy  problem  for  (2.2)  (f  =  0) 
does  not  generally  possess  globally  defined  smooth  solutions,  no  matter  how 
smooth  and  small  one  takes  the  initial  data  u(x,0)  and  u^fx,!)). 

In  a  material  with  memory  (such  as  certain  polymers,  suspensions,  or 
emulsions)  the  stress  at  a  material  point  x  and  at  time  t  depends  on  the 
entire  history  of  the  strain  at  x.  In  1874  Boltzmann  [5]  gave  the  following 
linear  constitutive  law  for  small  deformations  in  such  materials 

a(x,t)  -  Be(x,t)  +  m(s)  [e(x,t)  -  e  (x,t-s)]ds,  xcB,  -  •  <  t  <  «  .  (2.3) 

In  (2.3)  B  >  0  is  a  given  constant  and  m  :  (0,»)  ♦  R  is  a  given  positive, 
smooth,  nonincreasing  function.  We  limit  our  discussion  to  the  situation  in 
which  m  6  L^(0,»),  and  we  distinguish  two  cases: 

(i)  0  <  m(0)  <  *»,  (ii)  m(0+)  -  .  (2.4) 

The  function  m  is  called  a  memory  function.  The  fact  that  m  >  0  and  non¬ 
increasing  on  (0,*)  means  that  the  stress  "relaxes"  as  t  increases  and  the 
memory  term  in  (2.3)  fades:  deformations  which  occurred  in  the  distant  past 
have  less  influence  on  the  present  value  of  the  stress  than  those  which 
occurred  in  the  recent  past.  In  the  applied  literature  m  is  often  assumed 
to  be  a  finite  linear  combination  of  decaying  exponentials  with  positive 
coefficients  (these  expressions  result  from  least  equares  approximations  to 
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experimental  data).  Such  restrictions  are  neither  desirable  nor  necessary. 
Moreover,  kinetic  theories  for  chain  molecules  [15,46,53]  and  certain 
experiments  [32,28]  suggest  that  there  are  materials  for  which  m  is  singular 
as  in  (2.4)(ii),  m(t)  ~  t°-1  as  t  ♦  0+,  0  <  a  <  1,  m  is  positive, 
nonincreasing  on  0  <  t  <  »,  and  m  decays  rapidly  at  infinity.  Stronger 
power  singularities  at  zero  (a  <  0)  are  also  possible,  but  the  resulting 
mathematical  theory  for  nonlinear  materials  consistent  with  our  objectives  is 
incomplete  at  this  time. 

The  assumption  m  €  L*(0,®)  Implies  that  (2.3)  is  equivalent  to 

a(x,t)  -  c  e(x,t)  -  /Q  m(s)e  (x,t-s)ds,  xcB,  -  «<t<«  ,  (2.5) 

where  c  «■  6  +  Jq  m(s)ds  >0  is  a  constant  which  measures  the  instantaneous 
response  of  stress  to  strain;  8  >  0  is  the  equilibrium  stress  modulus.  If 
8  >  0  the  material  acts  like  a  solid,  while  if  8  *  0  it  acts  like  a  fluid. 

X  natural  generalization  of  (2.5)  to  nonlinear  materials  is  the 
constitutive  relation 

a(x,t)  -  t(e(x,t))  -  /“  m(s)iMe  (x,t-s)  )ds,  xcB,  —  <  t  <  •  ,  (2.6) 

in  which  $,  i|i  :  R  ♦  R  are  assigned,  smooth  material  functions  which  satisfy 

♦(0)  -  *(0)  -  0,  ♦'(0)  >  0,  il»’(0)  >  0  .  (2.7) 

The  memory  function  m  is  positive,  nonincreasing  and  integrable  on  (0,“) 
as  above.  In  the  static  case  e(x,t)  ”  e(x),  o(x,t)  «  o(x),  (2.6)  reduces  to 

cT(x)  ■  ♦(e’(x))  -  (/g  m(s)ds)iMF(x) ),  xc  B  . 

A  natural  assumption,  appropriate  for  viscoelastic  solids  and  crucial  in  the 
analysis  of  global  existence  results  (section  3),  is  to  require  that  <J> , 
also  satisfy 

♦  '(0)  -  (Jq  m(s)dsW(0)  >  0  »  (2.8) 

(2.8)  states  that  the  equilibrium  stress  modulus  is  positive.  The 
constitutive  assumption  (2.6)  is  a  particular  case  of  a  "simple  material"  [8] 
which  retains  many  important  qualitative  properties  of  more  general  material 
models;  moreover,  the  analysis  of  the  resulting  equation  of  motion  is 
relatively  simple  and  complete. 

The  balance  of  linear  momentum  and  (2.6)  yield  the  equation  of  motion 

Utt  “  ♦  (ux)x  ”  flo 0  ®(t-T  W(Ux(x,T  )  )xdT  +  f ,  X  t  B,  —  <  t  <  •  ,  (2.9) 

where  f  is  a  body  force  and  where  the  change  of  variable  r  t*  t-s  was  made 
in  (2.6).  The  history  of  the  motion  is  assumed  to  be  known  for  t  <  0  (the 
history  may,  but  need  not  satisfy  (2.9)  for  t  <  0).  An  appropriate  dynamic 
problem  is  to  find  a  smooth  function  u  s  B  x  (-*,<»)  *  R,  satisfying  (2.9) 
for  t  >  0,  and  such  that 

u( x, t )  -  u(x,t)  ,  x  €  B,  t  <  0  ,  (2.10) 
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where  the  history  u  i  B  x  (-*•,()]  +  R  is  a  given  smooth  function;  (2.9) , 

(2.10)  will  be  referred  to  as  a  history  value  problem.  If  B  is  bounded  or 
semibounded  compatible  boundary  conditions  are  adjoined  to  (2.9),  (2.10). 
Compatibility  of  the  boundary  conditions  with  the  smooth  data  f  and  u  is 
imposed  in  order  to  j-reclude  the  propagation  of  singularities  from  the 
boundary  into  the  in*  erior. 

If  mi  0,  (2.9)  reduces  to  the  quasi  linear  wave  equation  (2.2).  At  the 

other  extreme,  if  one  formally  sets  m  ■  -6 ' ,  where  6  is  the  Dirac  mass  at 
the  origin,  then  (2.9)  reduces  to  the  parabolic  equation 

utt  "  *<ux>xt  *  *<ux>x  +  f  * 

the  term  ’Mux)xt  represents  viscosity  of  Newtonian  type  if  is  smooth 
and  ip'(»)  >  0.  This  equation  possesses  globally  defined  smooth  solutions 
even  if  the  data  are  large  [1,34]. 

Our  objective  is  to  discuss  the  strength  of  the  dissipative  mechanism 
induced  by  the  memory  in  (2.9)  under  physically  reasonable  assumptions  by 
studying  the  existence  and  the  decay  or  growth  of  classical  solutions  of  the 
history  value  problem  (2.9),  (2.10).  To  motivate  the  mathematical  results,  we 
follow  Coleman  and  Gurtin  [6]  in  their  penetrating  study  on  the  growth  and 
decay  of  acceleration  waves  propagating  into  a  one-dimensional  viscoelastic 
material  with  memory  at  rest.  An  acceleration  wave  solution  u  is  similar  to 
a  shock  wave;  the  difference  is  that  second  rather  than  first  derivatives  of 
u  experience  a  jump  acrosss  the  wave  front.  To  apply  the  results  of  [6]  to 
(2.9),  (2.10),  we  assume  that  <p ,  ip  are  smooth,  satisfy  (2.7),  f  =  0,  B  =  R, 
and  m  is  a  smooth,  regular  kernel  satisfying  (2.4)(i).  The  wave  front  is  a 
smooth  curve  t  »  Y(x),  Y(0)  -  0,  and  u  5  0  for  t  <  y(x).  In  [6]  the 
problem  of  existence  of  acceleration  waves  is  not  discussed.  Assuming  that  they 
do,  an  easy  but  tedious  calculation  shows  that  for  (2.9)  t  ■  Y(x)  is  a 

straight  line,  of  slope  (♦'(0))”^2,  meaning  that  such  waves  propagate  with 
constant  speed  although  (2.9)  is  nonlinear.  Let  the  amplitude  of  the  wave  be 
q(t)  :*  [utt] ,  where  [ufct]  is  the  jump  in  ufcfc  across  the  line  t  «Y(x). 

It  follows  from  the  computations  in  [6]  that  q  evolves  in  accordance  with  the 
Ricatti-Bernoulli  equation 

q  *  Aq2  -  Bq  •  q<0)  -  qQ  »  (2.ii) 


di  3  3  2 

where  —  =  +  c  c  =  <P'(0),  represents  differentiation  along  the  wave 


front  and  where 


-<P"(0) 


2  [  <p'  ( 0 )  ] 


3/2 


B 


m(0)ip  1  (0) 
*'(0) 


Thus  if  ip"(0)  <  0  (similar  results  hold  for  ♦"(0)  >  0),  and  qp  <  B/A, 
then  every  solution  of  (2.11)  tends  to  zero  as  t  +  +“>.  By  contrast,  if 

1  *^0 

q0  >  +  B/A,  then  q(t)  ♦  4*  as  t  ♦  Tq,  where  Tq  =  —  log  —  —  >  0.  The 
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corresponding  jumps  in  uxt  and  uxx  are  given  by  [uxt]  =  '(O)l^q(t) 

and  [u^]  -  [♦  *  <  0 )  1  1q  t). 

This  result  suggests  the  following  conjectures  regarding  smooth  solutions 
of  the  history  value  problem  (2.9),  (2.10): 

(i)  The  problem  (2.9),  (2.10)  should  have  globally  defined  classical  solutions 
if  the  history  v  and  the  forcing  term  f  are  sufficiently  smooth  and  small  in 
Appropriate  norms..  Moreover,  such  solutions  should  decay. 

(ii)  The  smooth  solutions  of  (2.9),  (2.10)  should  develop  singularities  in 
second  derivatives  in  finite  time  if  the  smooth  data  are  chosen  sufficiently 
large. 


As  will  be  summarized  in  Section  3,  conjecture  (i)  has  been  established 
rigorously  by  a  number  of  authors  in  a  number  of  physically  important  cases  of 

(2.9) ,  (2.10)  for  regular  kernels  (m(0)  <  *),  as  well  as  for  singular 
kernels  (a(0+)  -  +«).  Conjecture  (ii)  has  only  been  established  for  regular 
kernels  (see  Section  4).  Moreover,  based  on  the  discussion  in  Section  4,  Remark 
4.6,  singular  kernels  m  strengthen  the  dissipative  mechanism  of  the  memory  in 

(2.9)  which  suggests  the  possibility  ttiit  for  appropriate  classes  of  singular 
kernels,  global  smooth  solutions  will  exist  even  if  the  data  are  arbitrarily 
large;  this  interesting  question  is  open. 

Most  of  the  results  described  in  Sections  3  and  4  for  smooth  kernels 
satisfying  (2.4a)  apply  to  more  general  one-dimensional  viscoelastic  models  with 
fading  memory,  e.g.  a  model  for  a  solid,  K-BKZ  material  [29,2] 

utt  *  *tux*x  +  «(t-T)h(ux(x,t),ux(x,T  )>x<*t  +  f,  (2.12) 

XCB,  -“  <  t  <  °» 

Here  4,  m,  and  f  are  as  in  (2.9),  while  h  :  R  *  R  -*•  R  is  a  smooth  material 
function,  h(p,p)  *  0  and  the  partial  derivatives  of  h  satisfy  appropriate 
sign  conditions,  at  least  at  (0,0).  If  4  =  0,  (2.12)  models  a  K-BKZ  fluid. 

Under  suitable  assumptions,  the  energy  method  for  proving  existence  results  in 
Section  3  and  the  method  of  characteristics  used  to  prove  blow-up  results  of 
Section  4  yield  similar  results  for  this  case  as  well.  The  energy  method  can 
also  be  applied  to  prove  existence  for  certain  multidimensional  viscoelastic 
problems  with  fading  memory  (e.g.  [13,  Sec.  4],  [30]).  However,  to  our 
knowledge,  the  existence  results  described  in  Section  3  for  singular  kernels 
satisfying  (2.4(ii))  depend  crucially  on  the  special  form  of  equation  (2.9). 


3.  Existence  of  Classical  Solutions.  For  discussion  of  the  mathematical 
results  it  is  convenient  to  renormalize  the  memory  function  m.  Define  the 
relaxation  function  a  by 

a(t)  :*  /"  m(s)ds,  0  <  t  <  •  ;  (3.1) 

observe  that  if  m  is  smooth,  positive,  decreasing  and  integrable  on  [0,®) 
then  a'(t)  -  -m(t)  and 

a  is  smooth,  positive,  decreasing  and  convex  on  (0,®)  .  (3.2) 


25 


(3.3) 


Analogous  to  (2.4)  we  distinguish  two  classes  of  kernels  a: 

(i)  0  <  -a'(0+)  <  -  ,  (ii)  -a'(0+)  -  +»  . 

Other  normalizations  of  the  memory  m  are  possible;  for  example,  the 
relaxation  function 

G(t)  4'(0)  -  a(0)*((0)  +  a(t)*'(0)  ,  0<t<«  ,  (3.4) 

where  4,  4  are  the  material  functions  in  (2.6),  is  consistent  with  the  applied 
literature.  Observe  that  G(")  *4'(0)  -a(0)4'(0)  and  G(0)  =  4'(0). 


Returning  to  the  history  value  problem  (2.9),  (2.10),  let  the  history  u 
be  identically  zero  for  t  <  0.  One  then  seeks  a  solution  of  the  Initial 
value  problem 

utt  =  'MVx  +  a'(t-T)4(ux(x,T))xdr  +f,xcB,  t>0  ,  (3.5) 


u(x,0)  *  Ug  ( x ) ,  u^(x,0)  »  Uj(x),  x  c  P  ,  (3.6) 

together  with  suitable  and  compatible  boundary  conditions  if  B  is  not  R. 

If  the  history  u  is  not  zero  for  t  <  0,  the  part  of  the  integral  in  (2.9) 
on  (-°°,0)  is  incorporated  in  f. 


Global  Existence  of  Classical  Solutions.  We  next  discuss  global 
existence  and  asymptotic  behaviour  for  the  Cauchy  problem  (3.5),  (3.6)  with 
B  =  R,  for  smooth,  small  data,  and  for  regular  kernels  a  satisfying  (3.2), 
(3.3)(i).  To  simplify  the  exposition,  we  make  the  hypothesis 

a  c  C3[0,«),  (-1)ka(k)(t)  >  0  (0  <  t  <  -;  k  =  0, 1,2,3)  , 


a'  ?  0,  and  /  ta(t)dt  <  • 


(3.7) 


The  results  hold  under  assumptions  on  a  considerably  weaker  than  (3.7).  The 
interested  reader  is  referred  to  [13],  [22],  [24],  and  the  survey  paper  [23] 
for  the  generalizations.  The  essential  point  is  that  kernel  a  ratisfies  a, 
a',  a"  e  1^(0, •),  the  moment  condition  in  (3.7),  and  is  "stron/ly  positive" 
on  [0,~).  The  result  for  B  bounded  [13]  is  somewhat  simpler  than  for  the 
Cauchy  problem  (3.5),  (3.6);  in  particular,  the  moment  condition  in  (3.7)  Is 
only  needed  for  the  Cauchy  problem  (see  remarks  following  Theorem  3.1  and  the 
outline  of  its  proof). 


Concerning  4,  4  assume 
4,4  c  C3(R) ,  4(0)  -  4(0)  -  0, 

4 ' (0 )  >  0,  4 • (0 )  >  0,  4  '  (0)  -  a(0)4’(0)  >  0  ; 


(3.8) 

he  latter  is  the  analogue  of  (2.8)  in  the  present  normalization.  Assume  that 

(3.9) 


(i)  f,  f  , 'f  c  C([0,»);  L2(»))  0  L*([0,-);  L2(*))  and 


(ii)  f  €  lMo,-*);  L2(*))j  f  ,  f  ,  f  €  L2(  [0,« ) ;  L2(R))  , 

x  t  xt 
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and  let  Ug,  satisfy 


U0  C  LLc(R)'  and  U0  U1  C  h2(R)  •  (3-10) 

To  measure  the  size  of  the  data  define  the  quantities 

VVV  :=  O*2  +  uo2  +  uo’2  +  ui  +  U12  +  ui2>(x>dx'  and  <3.ii) 

F(f)  :=  sup  /"  { f2  +  f2  +  f2}(x,t)dx  +  (/“{/“  f2(x,t)dx)1/2dt)2  (3.12) 

t>0  X  ^  *'0*'-°° 

+  /n  / (f2  +  f2  +  f2  }(x,t)dxdt 
1  0  J  -®  x  t  xt 

The  following  result  is  a  special  case  of  Theorem  1.1  of  (24). 

Theorem  3.1.  Let  assumptions  (3.7)  -  (3.10)  be  satisf ies .  There  exists  a 
constant  u  >  0  such  that  for  each  uQ,  u.,  f  satisfying 

UCuq.u,)  +  F(f)  <  u2  ,  (3.13) 

the  Cauchy  problem  (3.5),  (3.6)  has  a  unique  solution  u  e  C  (Rx  (0,®)),  and 

V  Ut'  Uxx' - »uttt  C  C(  [0,®);  L2(R)  )  n  l"(  [0,®);  L2(R)  )  .  (3.14) 

Moreover , 

u  ,  u  . u  e  L2([0,»);  L2(R)  ,  (3.15) 

/\A  At  1.  i.  l. 

2 

u  ,  u  ,  u  +  0  in  L  (R)  as  t  +  ®  ,  (3.16) 

XX  xt  tt  — 

U  ,  u  ,  u  ,  u  ,  u  >0  uniformly  onRast*®.  (3.17) 

x  t  xx  xt  tt  - - -  - 

A  similar  result  holds  for  the  history  value  problem  (2.9),  (2.10)  with  B  =  R. 

The  special  case  a(t)  =  ae  a  >  0,  X  >  0,  studied  by  Greenberg  [18]  for 

B  bounded,  is  carried  out  in  [23]  in  the  more  complicated  case  when  B  =  R. 


Remark  3.2.  Theorem  3.1  is  a  generalization  of  Theorems  1.1  and  4.1  of  [13] 
establishing  small-data  global  existence  results  for  analogous  initial 
boundary  value  problems  corresponding  to  motions  of  bounded  viscoelastic 
bodies;  Neumann,  Dirichlet  and  mixed  boundary  conditions  are  treated.  The 
principal  difficulty  in  proving  Theorem  3.1  is  that  various  Poincar^ 
inequalities,  not  applicable  to  (3.5),  (3.6)  when  B  =  R,  are  used  in  an 
essential  way  in  [13]  to  establish  an  a  priori  estimate  similar  to  (3.26)  from 
(3.32)  (see  outline  of  proof  following  Proposition  3.4);  the  estimate  (3.26) 
is  essential  for  completing  the  proof.  The  reader  is  referred  to  Hrusa  [22] 
for  a  discussion  of  general  history  value  problems  on  a  bounded  interval. 
Although  technically  extremely  complicated,  the  generalization  of  the  results 
in  [22]  to  the  Cauchy  problem  is  relatively  straightforward. 
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Remark  3.3.  If  <(>  =  ')>  equations  of  the  form  (3.5)  have  been  studied  by 
MacCamy  [35),  Dafermos  and  the  author  [12],  and  Staffans  [49]  for  bounded  and 
unbounded  bodies.  If  f  =  4> ,  (3.5)  admits  certain  estimates  which  do  not 
carry  over  to  the  general  case  t(i  f  $  (see  [23]  )»  there  does  not  appear  to  be 
any  physical  motivation  for  the  restriction  i|>  =  ()>  for  solids. 


Outline  of  the  proof  of  Theorem  3. 1.  An  essential  ingredient  of  any  global 
result  is  an  appropriate  local  existence  theorem.  For  regular  kernels  a 
satisfying  ( 3. 2 ) , ( 3. 3 ) ( i ) ,  the  idea  is  to  iterate  the  sequence  of  linear 
problems  which  treat  the  memory  as  a  lower-order  perturbation: 

u.  —  $'(w  )u„v  +  /Jr  a'(t— t)i|»(w  (x,t))  dr+f,  xtR,  0<t<T  ,  (3.18) 

where  T  >  0,  u  satisfies  the  initial  conditions  (3.6),  and  where  w  is  an 
element  of  a  suitably  chosen  function  space  X.  By  using  fairly  standard 
enery  estimates  deduced  from  (3.18),  requiring  only  very  simple  estimates  of 
the  convolution  term  which  do  not  use  any  sign  information  on  the  memory,  it 
is  shown  that  the  mapping  S  which  carries  w  into  a  solution  of  (3.18)  has 
a  unique  fixed  point  for  T  >  0  sufficiently  small.  The  proof  is  almost 
identical  with  that  of  Theorem  2.1  of  [13].  The  only  significant  difference 
is  that  the  proof  in  [13]  is  for  x  c  [0,1]  with  Neumann  boundary  conditions 
satisfied  at  x  =  0  and  x  =  1;  thus  the  Poincare  inequality  enables  one  to 
deduce  estimates  for  lower  order  derivatives  of  u  in  L°°([0,T)j  L2(0,1)) 
from  higher  order  derivative  estimates.  As  far  as  local  existence  is 
concerned  when  B  =  R,  this  causes  no  serious  difficulties.  One  simply 
expresses  the  lower  order  derivatives  of  the  solution  in  terms  of  initial 
conditions  and  time  integrals  of  the  higher  order  derivatives,  yielding  time 
dependent  bounds  which,  however,  cannot  be  used  for  obtaining  global 
estimates.  The  result  is: 


Proposition  3.4.  Let  a,  a',  a"  e  [  0 )  and  assume  that  4>  ,  lji  e  C3(R), 

4>  1  ( 0 )  >  0 ,  and  that  there  exists  a  number  such  that 

$'(£)  >  for  every  £  e  R  .  (3.19) 

Concerning  the  data,  let  Ug,  u^  satisfy  (3.10),  f  satisfy  (3.9)(i)  and 

1  2 

assume  that  fxfc  c  ([0,°*)»  L  (*))•  Then  the  Cauchy  problem  (3.5),  (3.6) 
has  a  unique  solution  u  defined  on  a  maximal  time  interval  [0,Tg) 
satisfying 


u  , 
x 


xt' 


Jtt' 


xxt 


xtt 9  ttt 


C( [0,T0)? 


L2(R)) 


(3.20) 


Moreover,  if 
sup 

te[0,T0) 
then  Tg  =  +«> 


Oax  +  ut  +*"+  “xtt  +  uttt}(x't)dX  < 


By  Sobolev  embedding  u  c  C  (*  x  [0,T0))< 


(3.21) 


In  outline,  the  proof  of  the  global  result  then  proceeds  as  follows. 
Define  the  equilibrium  stress  x  by 

X(£)  :=  MO  -  a(OXMC),  ^  c  R  ;  (3.22) 
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observe  that  x  e  C3(*)  and  that  x'(0)  >  0  (by  3.8).  Choose  a  sufficiently 
small  number  6  >  0  and  modify  $,  i)> ,  and  x  outside  [  —6 ,6 1  such  that 
4>"»  <P"  *  X"  vanish  outside  [-26,26],  and  choose  positive  constants  ij >_, 

such  that 


4>'<C)  >  £,  *'(£)  >  £,  X '  <C  )  >  £  V?  c  R  .  (3.23) 


It  is  shown  a  posteriori  that  |ux(x,t)|  <  6  for  all  x  c  R,  t  >  0.  By 
Proposition  3.4  the  Cauchy  problem  (3.5),  (3.6),  B  =  R  has  a  unique  solution 
u  on  a  maximal  interval  [0,Tg).  The  objective  is  to  show  that  if  (3.13) 
holds  with  u  >  0  sufficiently  small,  then  (3.21)  is  bounded  independent  of 
Tgj  a  standard  continuation  procedure  implies  Tg  =  +“>.  Define 

E(t)  :=  max  /  {u2  +  u2  +• • • +  u2  } (x,s)dx 

.  „  _  ,  t  x  ttt 

sc[0,t] 


(3.24) 


+  /o  C.  {uxx  +  Uxt  +***  +  uttt}(x's)dxds  ' 
where  •••  represent  the  sum  of  the  second  and  third  derivatives  not 
explicitly  written  down.  It  is  shown  that  if  (3.13)  holds  for  u  >  0 
sufficiently  small,  then  E(t)  is  bounded.  F<~>r  this  purpose  define 


Wt) 


sup  {u2  +  u2x  +  u2fc}/2(x,s),  Vt  e  [0,TQ) 
xeR 

sc [ 0, t] 


(3.25) 


To  prove  the  result  one  establishes  the  following  key  estimate 


E(t )  <  +  f( f ) )  +  r<v(t)  +  v3(t))E(t),  o  <  t  <  tq 


(3.26) 


where  here  and  below  T  is  a  generic  constant,  possibly  large,  independent 
of  Ug,  u^,  f,  and  Tg.  We  shall  comment  below  only  briefly  how  this  is 
accomplished. 


Once  (3.26)  is  established,  the  conclusions  of  Theorem  3.1  are  obtained 
as  follows.  Choose  E,  u  >  0  such  that 


—  9 

E  <  6  , 


T{(2E)1/2  +  ( 2E) 3/2}  <  1, 


ru 


(3.27) 


Select  the  data  uQ,  u^,  f  such  that  (3.13)  holds  for  y  chosen  in 
accordance  with  (3.27).  The  Sobolev  embedding  theorem  implies  that 


V(t)  <  ( 2E( 


Vt  e  [0,Tq)  . 


( 3.28) 


Therefore,  it  follows  from  (3.26),  (3.27),  (3.28)  that  for  any  t  c  [0,Tg) 

with  E(t)  <  E,  we  actually  have  E(t)  <  E.  By  continuity  E(t)  <  ^  E, 

1  —  2 
Vt  C  [ 0 , Tg ) ,  provided  E(0)  <  —  Ef  the  latter  is  insured  by  choosing  p 

smaller  if  necessary  so  that  (3.13)  will  imply  E(0)  <  E.  Then 

E(t)  <  E,  Vt  c  [ 0 , Tg ) ,  and  (3.24),  Proposition  3.1,  and  a  standard 
continuation  method  yield  Tg  =  4®.  One  also  has  that  (3.14),  (3.15)  hold, 
and  conclusions  (3.16),  (3.17)  follow  by  standard  embedding  inequalities. 
Moreover,  (3.25),  (3.27),  (3.28)  yield 

|u  (x,t)|  <  v(t)  <  (2E(t))1/2  <  (E)1/2  <  5,  Vx  c  R,  t  c  [0,®) 
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and  the  proof  is  complete 


Establishing  the  estimate  (3*26)  is  lengthy,  delicate,  and  relies  on  the 
correct  sign  of  the  memory  [under  assumption  (3.7)  or  certain  generalizations}. 
The  energy  method,  combined  with  relevant  properties  of  Volterra  operators  and 
their  resolvents,  is  employed.  The  estimates  of  derivatives  of  u  appearing 
Jn  (3.24)  are  deduced  from  energy  identities  obtained  directly  from  (3.5), 
(3.6),  and  from  the  equation  equivalent  to  (3.5): 


v  ■  x(Vx + n  •(t’Ti*<ux)xt<x'T>* 

+  a(t)iMuQ(x)x)x  +  f  (x  c  R,  0  <  t  <  T) 


(3.29) 


where  T  <  TQ;  (3.29)  is  obtained  from  (3.5),  (3.6)  by  an  integration  by  parts 
and  use  of  (3.22).  Useful  identities  for  derivatives  of  u  can  only  be  obtained 
by  multiplying  the  equations  by  quantities  which  make  it  possible  to  estimate  the 
memory  terms.  A  crucial  role  is  played  by  the  "quadratic  integral  form" 

Q(w,t,b)  :=  /j '  w ( x, s )  J*  b(s-T  )w(x,T  JdTdxds,  t  >  0  , 

1  2 
defined  for  b  e  [0 ,«» )  and  for  every  w  c  C( [0,t) »  L  (*)).  In  the  first 

energy  identity,  which  is  obtained  by  multiplying  (3.29)  by  ^(ux)xt  and 

integrating  the  equation  over  Rx  [0,T],  Q  arises  with  w  «  'J'(ujt)xt  and 

b  =  a.  It  is  an  important  fact  that  kernels  a  satisfying  (3.7)  (indeed  much 

weaker  assumptions)  are  positive  definite  on  [0,*).  To  obtain  the  second 

energy  identity,  one  needs  to  take  the  forward  time-difference  of  (3.29)  and 

integrate  the  resulting  equation  over  Rx  [0,T].  To  estimate  the  relevant 

derivatives  of  u  from  a  combination  of  the  first  two  identities  one  needs 

the  following  technical  estimate:  It  is  shown  in  [24j  Lemma  2.5)  that  if  a 

satisfies  (3.7),  there  exists  a  constant  k  >  0  such  that 

/ g  w^(x,t)dxdt  <  k  wt(x,0)dx  +  <Q(wt,t,a) 

!  (3.30) 

+  k  lim  inf  —  Q(A  w  ,t,a)  ,  V  t  c  [0,T]  , 

w  1 ^  n  v 
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where  w  e  C  ( [0,T] ;  L  (R) )  V  T  >  0,  and  where  the  forward  difference 
operator  Ahw  is  defined  by  A^w(x,t)  :■  w(x,t+h)  -  w(x,t).  In  the 
application  of  (3.30),  w  *  'Kux)xt  and  the  forward  difference  operator  Ah 
is  applied  to  equations  (3.29).  The  proof  of  (3.30)  also  makes  use  of  a 
result  of  Staffans  ((49,  Lemma  4.2)).  Using  the  two  energy  identities,  and 
(3.30),  it  is  relatively  straightforward  to  estimate  all  of  the  terms  and 
arrive  at: 


{u  +  u 
XX  xt 


<  r(U0+F)  + 


Istimates  of 


+  "xxt  +  *  !t  C  Vt'’''”d*d* 

r(v(t)+v3(t))E(t)  +  n/uJJ"  +  /p)  /iftT,  v  t  c  [o,t] 

utt(x*t)dx»  /l  U^tt(x,t)dx,  /q  U^tt ( X ,T  ) dxdt , 


(3.31) 


V  t  €  [0,T) 
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ig  terms^of  the  right  side  of  (3.31)  are  obtained  from  (3.5).  A  bound  for 
/„  J—  (x,s)dxds  can  then  be  obtained  by  interpolation.  Using  the  fact 
that  a  certain  resolvent  kernel  of  a'  in  (3.5)  is  in  L^IO,**),  Lemma  3.2 

ffl*  2 

of  [13]  makes  it  possible  to  estimate  I  u  (x,t)dx  and 

_  XXX 

/M  2 

0  »  (x.s)dxds.  Combining  these  with  (3.31)  yields  the  estimate 


/I 


(u2  +  u2  ♦  U2  +  u2  ♦  u2  +  u2  +  u2  >  (x,t)dx  (3.32) 

xx  xt  tt  xxx  xxt  xtt  ttt 


ft  rm  t  2  2  2  2  , 

+  Jn  /_»  'u  +  u  .  +  u  +  u...}(x,s)ds 

v  0  1  XXX  xxt  xtt  ttt 


xxt 


xtt 


ttt' 


<  r<U0  +  F)  +  r(v(t)  +  v  ( t) )E(t ) 


+  r(/Up  +  /F)  ^Ett),  V  t  c  [ 0, T]  . 

The  estimate  (3.32)  is  implicit  in  the  argument  of  [13].  It  should  be 
observed  that  for  problems  on  bounded  intervals  (Remark  3.2),  it  is  a  simple 
matter  to  apply  the  Poincari  inequality  to  deduce  the  remaining  estimates  of 
derivatives  of  u  appearing  in  (3.24)  and  arrive  at  the  final  estimate  (3.26) 
directly  from  (3.32).  However,  to  accomplish  this  task  for  (3.5),  (3.6) 
when  B  *  R  is  quite  tricky  and  involves  additional  properties  of  Volterra 
operators  and  certain  other  of  their  resolvents.  The  reader  is  referred  to 
Lemmas  2.3  and  2.4,  as  well  as  the  argument  on  pages  405-410  of  [24]  for 
details.  This  part  of  the  proof  makes  essential  use  of  the  assumption 
a"  e  L^ [ 0 ,")  which  is  automatic  when  a  satisfies  (3.7),  but  cannot  be 
satisfied  by  singular  kernels. 
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For  singular  kernels  satisfying  (3.2)  and  (3.3)(ii),  it  is  simpler  to 
restrict  the  analysis  to  the  history  valu£  problem,  (2*9),  (2.10),  with  a 
defined  by  (3.1),  in  which  that  history  u  satisfies  the  equation  (and  the 
boundary  conditions  if  B  is  bounded).  This  ensures  that  the  compatibility 
conditions  between  the  history  and  boundary  data,  as  well  as  compatibility 
conditions  between  the  derivatives  of  the  history  and  the  solution  for  t  >  0 
are  satisfied.  If  u  is  a  smooth  solution  of  (2.9)  and  the  kernel  a  is 
singular,  the  integral  in  (2.9)  is  also  a  smooth  function,  but  the  integrals 
/£ „  and  / q  have  singularities  at  t  »  0  which  cancel.  Thus  if  formulated 
as  an  initial  value  problem  the  results  would  involve  a  singular  forcing 
term.  For  reasons  explained  below,  global  existence  results  for  singular 
kernels  only  hold  for  B  bounded. 

The  principal  difficulty  when  dealing  with  singular  kernels  is 
establishing  a  suitable  local  existence  result.  In  Proposition  3.4  for 
regular  kernels  no  hypothesis  is  made  concerning  the  sign  of  the  memory  and 
the  size  of  the  data.  In  the  proof  the  memory  is  treated  as  a  perturbation  of 
the  elastic  term  4(ux)x  in  (3.5).  However,  the  proof  makes  crucial  use  of 

the  hypothesis  a"  c  L^oc[0,»)  which  rules  out  singular  kernels  a 

satisfying  (3.2),  (3.3)(ii). 

Hrusa  and  Renardy  [25,  Theorem  4.1]  recently  obtained  an  elegant 
extension  of  Proposition  3.4  for  such  singular  kernels.  They  consider  the 
history  value  problem  with  the  history  satisfying  the  equation  and  the 
boundary  conditions  for  t  <  0.  The  singular  kernel  a  satisfies  the 
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I 


assumptions 


a,  a'  c  L^O,-);  a(t )  >  0,  a'(t)  <  0,  a"(t)  >  0,  0  <  t  <  »  (3.33) 

in  the  sense  of  measures ,  and  a"  is  not  a  purely  singular  measure;  a  certain 
assumption  on  the  Laplace  transform  of  a  is  imposed  in  order  to  guarantee 
that  the  third  derivatives  of  u  are  continuous  with  values  in  L^(0,1).  The 
material  function  i|>  is  also  required  to  satisfy  ^'(0)  >  0,  and  the 
technical  assumptions  regarding  the  forcing  function  f  are  strengthened. 

The  sign  of  the  memory  now  plays  a  crucial  role  in  the  local  analysis  in  which 
one  iterates  a  sequence  of  linear  integrodifferential  equations  (compare  with 
(3.18)) 

utt  “  ♦,(wx)uxx  +  ftm  *' (t-T)tp' (wx)ujot(xfT  )dr  +  f  (3.34) 

where  u(x,t)  ■  u(x,t)  for  t  <  0,  and  where  w  is  an  element  of  an 
apropriately  chosen  function  space.  The  singular  kernel  a  satisfying  (3.33) 
is  replaced  in  (3.34)  by  regular  kernels  ag  defined  by 

afi(t)  pfi  (t  )a(t+6+r  )dT ,  0  <  t  <  ",  6  >  0  , 

where  pg  is  a  standard  mollifier  supported  in  [-6/2, 6/2].  The  analysis 
with  singular  kernels  is  far  more  complicated  because  a"  /  [0,«)r  and 

lafl  does  not  necessarily  remain  bounded  as  6  +  0.  The  energy  estimates 

®  f  ' 

are  also  considerably  more  delicate  and  to  obtain  them  certain  technical 
lemmas  concerning  Volterra  operators  with  kernels  a  satisfying  (3.33)  are 
required  (such  kernels  are  known  to  be  strongly  positive  definite  [43]).  It 
is  first  shown  that  each  linear  problem  (3.34)  has  a  unique  solution  having 
the  required  regularity  by  justifying  passage  to  the  limit  as  6+0.  Then  a 
contraction  mapping  argument  for  (3.34)  is  used  in  [25]  to  obtain  the  analogue 
of  Proposition  3.4  for  w  belonging  to  an  appropriate  function  space.  The 
proof  in  [25]  is  carried  out  for  B  -  [0,1]  with  Dirichlet  boundary 
conditions  satisfied  at  x  •  0  and  x  =*  1;  it  is  straightforward  to  obtain  a 
similar  local  result  for  B  •  R,  because  the  local  existence  proof  in  [25] 
avoids  the  use  of  PoincarS  inequalities . 

Using  their  local  result,  Hrusa  and  Renardy  then  obtain  an  analogue  of 
Theorem  3.1  for  the  history  value  problem  (2.9),  (2.10)  and  the  (singular) 
kernel  a,  defined  by  (3.1),  satisfying  (3.33)  on  bounded  intervals.  They 
impose  the  requirement  that  the  history  and  the  solution  satisfy  Dirichlet 
boundary  conditions  at  x  ■  0  and  x  *  1  and  that  the  history  and  forcing 
term  be  suitably  small.  Their  result  ([25,  Theorem  5.1])  is  then  a  simple 
extension  of  the  proof  of  [13,  Theorem  1.1]  involving  the  modification  of  only 
one  estimate  in  [13];  the  modification  uses  a  refinement  of  Lemma  4.2  in  [49], 
because  a"  ^  L*[0,«)  whenever  a  is  singular.  The  fact  that 
a"  /  L^[0,")  makes  it  difficult  to  prove  Theorem  3.1  for  singular  kernels 
using  the  analysis  in  [25].  It  is  a  challenging  open  problem  to  prove  such  a 
result  for  singular  kernels  on  all  of  space. 


4.  Development  of  Singularities  and  Related  Problems.  In  this  section  we 
consider  the  Cauchy  problem  (3.5),  (3.6)  for  regular  kernels  a,  and  we 
discuss  the  development  of  singularities  in  smooth  solutions  in  finite  time 
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for  smooth  but  large  data  by  using  the  method  of  characteristics.  To  avoid 
technical  complications  we  assume  that  the  forcing  term  f  H  0  in  (3.5),  and 
we  study 

utfc  -  $(ux)x  +  a’*T|>(ux)x,  x  €  R,  t  >  0  ,  (4.1) 

u(x,0)  -  u0(x),  u^.(x,0)  -  u^ ( x ) ,  x  e  R  ,  (4.2) 

where  *  denotes  the  time  convolution  on  [0,t].  The  following  result  was 
recently  established  by  M.  Renardy  and  the  author  [42],  and  independently  by 
Dafermos  [10]  for  general  memory  functionals  using  a  somewhat  different 
proof.  Hie  result  can  also  be  established  by  extending  techniques  of  F.  John 
[27]  to  quasiHnear,  first-order  hyperbolic  systems  with  lower  order  source 
terms i  however,  the  approach  outlined  below  is  more  direct. 

Theorem  4. 1.  Let  4,^  e  C3(R)  let  satisfy  (3.19)  and  let  a  be 
smooth  with  a,  a',  a"  c  ^qq^*00)*  In  addition,  let  $"(0)  f  0.  Then  for 

1  2  00 

every  T^  >  0,  there  exists  initial  data  Uq,  u^  c  C  (R)  O  L  (R)  such  that 
the  maximal  interval  of  existence  of  the  smooth  solution  u  of  the  Cauchy 
problems  (4.1),  (4.2)  cannot  exceed  Tj.  More  precisely,  if  sup  |uMx)l and 

xcR 

sup  |uj(x)|  are  sufficiently  small,  while  ug(x)  and  uj(x)  are 
xeR 

sufficiently  large  (with  appropriate  feigns),  then  there  exists  a  number 
t*  <  T.j  such  that 


8UP  *  (|u  (x,t)|  +  |u  (x, t) | }  =»  ,  (4.3) 

R<[0,t  ) 

while 

sup  #  {|u  (x,t)|  +  |u  (x,t)|)  <  •  |  (4.4) 

W[0,t  ) 

For  the  special  case  i| i  =  $,  Hattori  [21]  has  shown  that  if  4 "  ?  0  and 
if  the  body  B  is  bounded,  then  there  exist  data  Uq,  u1  such  that  the 
initial-boundary  value  problem  (consisting  of  (4.1),  (4.2)  and  compatible 
Oirichlet  boundary  conditions)  does  not  have  a  globally  defined  smooth 
solution.  However,  his  method  does  not  enable  him  to  characterize  the  data. 
Ramaha  [45]  has  recently  obtained  a  blow-up  result  when  =  $. 

For  first-order  model  problems  with  fading  memory,  blow-up  results 
similar  to  Theorem  4.1  have  been  obtained  by  a  number  of  authors  ([38],  [36], 
[9])  by  the  method  of  characteristics.  Existence  of  classical  solutions  for 
small  data  for  such  models  is  discussed  in  [41].  ttie  elegant  method  of 
Dafermos  [9]  avoids  use  of  characteristics;  instead  a  maximum  principle  is 
obtained  and  used. 

Remark  4.2.  The  reader  should  observe  that  in  Theorem  4. 1  only  the  additional 
hypothesis  ♦"(0)  f  0  is  added  to  the  assumptions  guaranteeing  the  existence 
of  a  local  smooth  solution  of  (4.1),  (4.2)  (Proposition  3.4).  No  sign 
information  on  the  kernel  a  is  required.  Assumption  (3.19)  is  not 
restrictive  because  it  is  shown  that  the  supremum  in  (4.4)  is  in  fact  small. 

The  proof  of  Theorem  4.1  generalizes  the  approach  of  Lax  [33]  using  the 
method  of  characteristics  and  generalized  Riemann  invariants.  We  transform 
(4.1),  (4.2)  to  an  equivalent  first-order  system  as  follows.  Let  w  «  ux. 
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z  -a'*<|/(w) 


(4.5) 


v  “  ufc;  define 

o  :»  «Mw)  -  z  , 

and  observe  that  a  is  the  stress-strain  functionals  (2.6).  Since 
♦  '(•)  >  0,  equation  (4.5)  can  be  solved  for  w,  w  -  4"^(c+z)  g(o,z), 

and  g  is  a  smooth  function  on  R  x  r.  As  long  as  the  solution  u  of  (4.1). 
(4.2)  remains  smooth.  (4.1).  (4.2)  is  equivalent  to  the  system 


Ot  •  C2(o,z)vx  +  a'  (O)i(i(g(a  ,z) )  +  a"*i|)(g(o,z))  , 

■  -a' (0)4»(g(o,z) )  -  a"*i|> (g(o  .z)  )  • 


v(x,0)  -  u  1  ( x ) .  o(x.O)  -  4(u£(x)),  z(x.O)  =  0  .  (4.7) 

where  the  wave  speed  C(o,z)  j-  [4'(g(o. z))]^2  is  a  smooth  function.  The 
system  (4.6)  is  hyperbolic  with  eigenvalues  C.  -C,  0.  We  define  generalized 
Riemann  invariants  r.  s  by 

r  -  r(v.o.z)  i-  v  +  *(a,z);  s  -  s(v,a,z)  »»  v  -  *(a,z)»  4(a,z)  :»  Ja  — 

cu  #*) 

Thus  v  *  th®  correspondence  is  smoothly  invertible  because 

♦0  ■  c"1  >  0.  Observe  that  if  a'  =  0  in  (4.1),  z  =  0  and  g,  C  are 
independent  of  z.  In  this  situation  r  and  s  reduce  to  the  Riemann 
invariants  for  the  system 


vt  “  ax'  °t  "  ♦  '(♦"1(«>>vx 


which  can  be  transformed  to  the  quasilinear  wave  equation.  In  the  proof 
r,  8,  z  are  introduced  as  dependent  variables  and  (4.6)  is  replaced  by  an 
equivalent  system  obtained  by  differentiating  r,  s,  z  along  the 
characteristics  C,  -C,  0  respectively.  One  then  differentiates  the 
quantities 


Vx  +  C(o,z) 


x  v  -  — - —  , 

X  C(0,z) 


and 


along  the  C,  -C,  0  characteristics  respectively  (observe  that  if  a'  =  0, 
p  ■  rx,  T  -  xx).  It  is  shown  (see  [42J_  for  details)  that  to  leading  order  the 
characteristic  derivatives  of  /c  p,  /c  t  satisfy  a  coupled  system  of  Ricatti 
equations  in  p  and  t  with  coefficients  which  are  smooth  functions  of 
r,  s,  z.  The  differential  equation  for  zx  is  linear  in  p.  T,  zx,  and  it 
is  shown  that  zx  grows  at  most  logarithmically.  Blow-up  in  finite  time  is 
established  by  snowing  that  r,  s,  z  remain  in  a  neighborhood  U  of  zero  up 
to  the  blowup  time,  if  they  are  small  initially  (i.e.  if  sup{|v(x,0)|  +  |o(x,0)|} 

R 

is  small),  while  v'(x,0)  and  <t'(x,0)  are  sufficiently  large  (with 
appropriate  signs).  Moreover,  the  hypothesis  4"(0)  ¥  0  provides  upper  and 
lower  nonzero  bounds  for  the  coefficients  p*  and  r*  in  the  Ricatti 
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equations  when  r,  e,  z  are  in  U 


Remark  4.3.  A  physical  interpretation  of  conclusions  (4.3),  (4.4),  coupled 
with  examples  of  Coleman,  Gurtin,  and  Herrera  [7],  is  that  the  strain  remains 
bounded  but  its  first  derivatives  become  infinite  as  t  ♦  t*.  Thus  Theorem 
4.1  suggests,  but  does  not  prove,  the  development  of  a  shock  front  in  finite 
time. 

Remark  4.4.  Certain  models  for  shearing  flows  of  viscoelastic  fluids  can  be 
analyzed  by  the  technique  of  Theorem  4.1.  With  v(x,t)  denoting  the  velocity 
of  the  fluid  in  simple  shear,  Slemrod  [48]  studies  the  problem 

vt  -  a**<vx)x,  x  c  R,  t  >  0 

(4.8) 

v(x,0)  -  Vq ( x )  ,  x  c  R 

in  the  special  case  a(t)  *  e-t.  Differentiation  of  the  equation  leads  to  a 
Cauchy  problem  of  the  form  (4.1),  (4.2).  Global  existence  for  smooth,  small 
data  follows  from  [12,  Theorem  4.1];  see  also  Remark  3.3.  Development  of 
singularities  for  large  data  is  an  easy  application  of  Theorem  4.1  above. 

Other  popular  models  for  viscoelastic  fluids  can  be  discussed  by  a  similar 
analysis.  Slemrod  [47]  and  Gripenberg  [20]  established  similar  results  for  a 
different  model  of  shearing  flows  for  a  viscoelastic  fluid.  If  a  «*  e"fc, 

(4.8)  as  well  as  the  problem  studied  in  [47],  can  be  transformed  to  the  quasi- 
linear  wave  equation  with  linear  frictional  damping  for  which  finite  time 
blow-up  for  large  data  can  be  established  by  the  method  of  Lax  [33]. 

We  close  this  section  by  discussing  a  number  of  open  problems. 

Remark  4.5.  The  techniques  of  proof  of  Theorem  4.1  and  that  of  [10]  depend 
crucially  on  the  hypothesis  ♦"(0)  f  0.  The  physically  important  situation 
♦ "  ( 0 )  *  0,  permitted  in  the  finite  time  blow-up  result  for  the  quasilinear 
wave  equation  (2.2)  (with  f  =  0)  in  [37],  constitutes  an  interesting  open 
problem  for  (4.1),  (4.2). 

Remark  4.6.  Singular  kernels  ^a  satisfying  (3.2)  and  (3.3)  (ii)  -a’(0+)  » 
violate  the  hypothesis  a"  e  [0,»)  which  is  crucial  to  the  technique  of 
proof  of  Theorem  4.1  and  that  o?cthe  similar  result  in  [10].  Indeed,  there  is 
strong  evidence  based  on  the  following  arguments,  that  there  may  exist 
singular  kernels  a  such  that  (4.1)  would  have  globally  defined  smooth 
solutions,  even  if  the  data  are  arbitrarily  large.  These  arguments  suggest 
that  singular  kernels  strengthen  the  dissipation  induced  by  the  memory.  Thus 
far  it  has  not  been  possible  to  resolve  this  important  open  problem. 

First,  for  smooth  kernels  with  -a'(0+)  finite,  it  follows  from  (2.11) 
and  the  definition  of  the  constant  B  that  the  diameter  of  the  set  of 
points  qg  >  B/A  for  which  q(t)  ♦  +»  in  finite  time  shrinks  as  m(0)  - 
-a'(0+)  >  0  is  increased.  However,  the  derivation  of  (2.11)  rests  on  the 
assumption  that  m(0)  -  -a'(0+)  remains  finite.  Second,  there  are 
interesting  results  of  Hrusa  and  Renardy  [26]  in  their  analysis  of  wave 
propagation  in  linear  visco-elasticity.  They  study  the  linear  history  value 
problem  (2.9),  (2.10)  with  <>•(•)  =  c2  -8  +  Jgm(T)dr  and  *'(•)£  1, 
u(x,t)  s  0,  t  <  0,  B  *  R,  and  they  adjoin  step  jump  initial  data  u(x,0), 
ut(x,0),  x  €  R.  They  prove  that  if  the  memory  m  is  smooth  on  [0,*),  the 
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solution  has  discontinuitiss  propagating  along  characteristics  of  the  linear 
wave  equation  u^  ■  c2uxx  and  a  stationary  discontinuity  of  higher  order  at 
the  initial  step** jumps  •  For  singular  memory  kernels  the  propagating  waves  are 
smoothed  out*  The  degree  of  smoothing  increases  as  the  kernel  becomes  more 
singular)  the  stationary  discontinuities  remain. 

Remark  4.7.  There  is  numerical  evidence  concerning  the  development  of 
singularities  in  finite  time  for  regular  kernels  a  and  large  smooth  data. 
Markowich  and  Renardy  [39]  used  the  Lax-Wendroff  method  to  discretize  the 
hyperbolic  part  in  (4.1)  and  the  trapezoidal  rule  to  discretize  the  integral. 
They  show  that  the  method  is  second-order  convergent  and  stable  on  any  finite 
time  interval  on  which  smooth  solutions  exist.  For  spatially  periodic  and 
small  Cauchy  data,  and  for  kernels  a  which  are  finite  sums  of  decaying 
exponentials,  they  prove  second  order  convergence  on  [0,°°).  they  also  carry 
out  numerical  experiments  in  the  special  case  iji  =  $  which  exhibit  the 
formation  of  a  singularity  in  finite  time  for  particularly  chosen  4  #  a,  and 
suitably  large  uQ  and  u-j.  Their  numerical  solution  exhibits  but  does  not 
prove  the  formation  of  shock  fronts  in  ux  and  u^.  at  the  critical  time. 
Other  numerical  schemes  merit  investigation. 

Remark  4.8.  Weak  Solutions.  Remarks  4.3  and  4.7  motivate  the  study  of  weak 
solutions  for  equations  such  as  (4.1),  (4.2)  governing  the  motion  of  materials 
with  memory.  Except  for  certain  special  situations  valid  for  steady  visco¬ 
elastic  fluid  flows  (Pipkin  [44]  and  Greenberg  [17]),  there  is  no  rigorous 
theory  for  the  existence  of  shock  waves  and  acceleration  waves.  MacCamy  [36], 
Greenberg  and  Hsiao  [19]  have  studied  several  aspects  of  weak  solutions  but 
only  for  a  single  first-order  conservation  law  with  memory  in  one  space 
dimension.  Dafermos  and  Hsiao  [11]  proved  the  existence  of  weak  solution  of 
one-dimensional  first-order  quasilinear  hyperbolic  systems  with  memory  using 
Glimm's  modified  random  choice  method  [16]  with  fractional  steps.  However, 
their  method  requires  assumptions  of  "diagonal  dominance"  which  are  not 
satisfied  in  the  case  of  the  Cauchy  problem  (4.1),  (4.2)  modelling  a 
viscoelastic  solid.  They  are  satisfied  for  certain  models  of  heat  flow  (see 
[12])  and  the  specific  model  (4.8)  for  viscoelastic  fluid  flow). 

In  order  to  address  the  problem  of  weak  solutions  which  would  include 
one-dimensional  problems  for  viscoelastic  solids  of  the  form  (4.1),  (4.2),  a 
program  has  been  initiated  involving  analytical  techniques,  the  design  of 
numerical  algorithms  and  numerical  experiments.  We  consider  the  Cauchy 
problem  (4.1),  (4.2)  in  the  form  of  a  first-order  equivalent  system.  Let 
w  «  Ujj,  v  -  Uj..  For  classical  solutions,  (4.1),  (4.2)  is  equivalent  to  the 
system 

wt  *  vx 

Vt  -  4(w)x  +  a' *<»(w)x 

satisfying  the  initial  conditions 

w(x,0)  -  w0(x),  v(x,0)  -  v0(x)  .  (4.10) 

It  is  easy  to  show  that  a  weak  solution  (in  the  sense  of  distributions) 
of  (4.1),  (4.2)  is  a  weak  solution  of  (4.9),  (4.10).  It  is  straightforward 
that  the  Rankine-Hugoniot  jump  conditions  for  elastic  shocks  (a=0  in  (4.1)) 
are  also  necessary  for  viscoelastic  shocks. 


(4.9) 
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the  Riemann  problem  Is  only  partially  understood  for  scalar  first-order 
conservation  laws  with  memory  [36]#  but  not  at  all  for  the  viscoelastic 
problem  (4.9)#  (4.10).  Therefore  it  is  difficult  to  use  the  random  choice 
method  [16].  If  ♦  W  <fr#  define  z  -  a’hMw).  Bien  (4.9)  transforms 
to  the  hyperbolic  system  with  lower  order  source  terms: 


m  V 


Vt  -  4 (w)x  ♦  *x 


(4.11) 


zt  -  a'(0)*(w)  +  a"*(w)  # 


with  w(x#0)#  v(x#0)  satisfying  (4.10)  and  z(x,0)  =0.  If  $'(•)>  0 

(4.11)  has  the  eigenvalues  ±  (♦'(•))^2  and  0.  If  ♦'(•)  -  a(0 )♦'(•)  >  0, 
(4.11)  has  a  uniquely  determined  steady  state  solution.  Observe  that 
initially  zx  =  0;  one  can  solve  the  first  two  equations  in  (4.11)  by  various 
techniques  for  conservation  laws  on  the  first  time  step#  update  z  using  the 
last  equation  and  proceed  forward  in  time.  Jointly  with  B.  Plohr  we 
have  initiated  a  study  of  various  numerical  algorithms  for  (4.11)  in  the 

n 


special  case  a(t) 


l  a.  exp(-X  t)  # 
i-1  *  K 


a^  )  0#  Xjg  )  0# 


including  the  Glimm 


scheme  with  fractional  steps.  One  objective  is  to  establish  existence  of  weak 
solutions  for  small  BV  data.  Another  is  to  obtain  implementable  numerical 
algorithms  which  can  be  tested  on  concrete  problems. 


Boldrini  [3#  4]  used  techniques  of  compensated  compactness  to  study 
elastic  and  viscoelastic  problems  including  the  system  (4.9),  (4.10).  These 
techniques  were  developed  by  Tartar  [50,51,52],  Murat  [40]  and  DiPerna  [14]; 
in  [14]  DiPerna  succeeded  to  .extend  these  techniques  and  apply  them  to 
establish  the  existence  of  weak  solutions  of  the  purely  elastic  one-dimensional 
problem  (i.e.  (4.9),  (4.10)  with  a  £  0)  on  Rx  [0,T]  for  any  T  >  0, 
without  restricting  the  size  of  the  data.  Boldrini  [4]  assumes  that  the 
memory  in  (4.9)  is  small  in  the  sense  that 

a  a(6,t),  <!(•):■♦(•)+  yg(*)  ,  (4.12) 

where  6  >0,  p  >0  are  small  parameters,  g  is  a  smooth  function  satisfying 
the  growth  condition  |g(w)|  <  x|w|,  k  >  0,  and  a'(6,t)  ■  0(6), 
a"(6,t)  -  0(6)  uniformly  in  t.  In  place  of  (4.9)  he  considers  the 
regularized  system 


Vf.  m  V 

t  vx 

Vt  -  4(w)x  +  a*  (6,»)  *  (4(W)  +  pg(w))x  +  ev^. 


(4.13) 


with  initial  data  (4.10)  (the  Newtonian  viscosity  can  be  more  general  than 

evvv),  where  e  >  0  is  a  small  parameter.  Let  w_  .  ,  v  .  be  a 

xx  e  ,o  ,w  e  ,o  ,p 

solution  of  (4.9),  (4.10)  on  R*  [0,T]  for  any  T  >  0.  Boldrini  gives 

sufficient  conditions  Which  insure  that  there  is  a  subsequence  such  that 

W.  *  ♦  W,  V.  .  V  on  R  x  [0,T]  as  e,  6,  p  ♦  0+,  where 

6  ,6  ,p  «  E  ,6  ,p 

p  -  0(e'2  6<"').  Moreover,  w,  v  is  a  weak  solution  of  the  purely  elastic 
problem  on  R  x  [0,T].  The  most  serious  of  his  assumptions  is  the  crucial 
hypothesis  requiring  the  solutions  w£  $  ^ ,  v€  ^  of  (4.13)  to  lie  in  l" 
uniformly  in  the  parameter  e,  6,  p.  Since  the  memory  is  a  nonlocal  operator, 
this  assumption  is  difficult  to  verify. 
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Jointly  with  W.  Rogers  and  T.  Tzavaras,  we  are  using  compensatea  compact¬ 
ness  techniques  to  establish  the  existence  of  weak  solutions  of  (4.9),  (4.10). 
The  special  case  *  #,  but  with  the  memory  not  small  (i.e.  a  independent  of 
6)  is  tractable  by  these  methods  and  the  case  4>  j*  4  appears  doable. 

However,  obtaining  an  invariant  region  in  order  to  show  that  solutions  of  the 
relevant  regularized  system  lie  in  L*  is  extremely  difficult.  It  is  of 
interest  to  note  that  the  existence  of  weak  solutions  of  the  Cauchy  problem 
for  the  model  first-order  scalar  equation  with  memory 

ut  +  ♦(u)x  +  a'*t|)(u)x  -  0,  x  c  R,  t  >  0 

(4.14) 

u(x,0)  -  Ug ( X)  ,  X  C  R  , 

where  a ,♦,<!»  ha/e  the  same  meaning  as  in  (4.9),  can  be  solved  completely  by 
using  the  method  of  compensated  compactness.  The  maximum  principle  proved  by 
Dafermos  in  [9]  for  classical  solution  of  (4.14)  makes  it  possible  to  prove 
the  needed  L  estimates  for  solutions  of  the  regularized  problem  (i.e. 

(4.14)  with  £uxx  on  the  right  side  in  place  of  zero).  This  problem  was 
recently  solved  by  Dafermos  (oral  communication).  Unfortunately,  it  does  not 
appear  that  this  approach  can  be  extended  to  coupled  two  by  two  systems  with 
memory. 
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Abstract.  Our  continuing  study  of  nonstrlctly  hyperbolic  2x2  systems  of 
conservation  laws  Is  described.  Preliminary  results  on  shock  formation 
In  a  special  case  are  given.  The  Rlemarm  Initial  value  problem  Is 
discussed  In  the  context  of  the  four  cases  arising  from  the 
classification  of  nonstrlctly  hyperbolic  equations.  The  solution  Is 
outlined  In  one  of  the  cases,  with  a  discussion  of  some  of  the  new 
features. 


1.  Introduction.  In  this  paper,  we  describe  recent  progress  In 
understanding  systems  of  nonlinear  hyperbolic  conservation  laws  whose 
characteristic  speeds  coincide  at  some  value  of  the  state  variable.  Such 
nonstrlctly  hyperbolic  equations  arise  In  modelling  three  phase  flow  in 
porous  media  (the  primary  motivation  for  our  work)  [14],  In  studies  of 
plane  elastic  waves  [17],  and  In  the  Lundqulst  equations  of 
Magnetohydrodynamics  (of.  [2]).  Here  we  consider  only  2x2  systems: 

+  F(u)  *0-w<x<**,  t  >  0,  (1.1) 

where  U  «  U(x,t)  «  R2  and  F:  R2  — *  R2. 
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System  (1.1)  is  hyperbolic  if  dF(U)  has  real  eigenvalues  A^U)  < 

a2 (U) .  Strict  hyperbolic! ty  fails  at  points  U*  for  which  ^ (U*)  = 

« 

a2(U  ).  As  shown  in  [14],  such  a  point  U  is  generically  an  inbilic 

point:  i.e.,  dF(U*)  is  a  multiple  of  the  Identity,  and  A^(U)  *  a2(U)  for 

U  t  U  near  U*.  This  situation  differs  from  that  in  [5,9],  where  the 
form  of  the  equations  allows  the  presence  of  a  curve  of  values  of  U  for 
which  Aj(U)  ■  a2(U).  We  remark  further  that  an  umbillc  point  may  be 

regarded  as  an  elliptic  region  that  has  been  shrunk  to  a  point.  Indeed, 
perturbing  the  equations  near  an  umbillc  point  will  in  general  produce  a 
small  region  in  which  the  eigenvalues  of  dF(U)  are  complex  (cf . 

[2,4,14]). 

In  a  neighborhood  of  an  umbillc  point,  the  properties  of  equation 
(1.1)  are  strikingly  different  from  properties  of  strictly  hyperbolic 
equations.  In  this  paper,  we  discuss  preliminary  results  on  shock 
formation,  and  present  a  sample  of  solutions  of  the  Rlemann  initial  value 
problem  that  is  central  to  numerical  front  tracking  [3]. 

Properties  of  equation  (1.1)  near  an  umbillc  point  U*  depend  on  the 

form  of  the  quadratic  terms  In  the  Taylor  series  expansion  of  F(U)  about 
* 

U  .  To  focus  on  these  terms,  consider  purely  quadratic  nonlinearities  Q: 

Ut  +  Q<U>x  “  °'  ”  "  <  x  <  1  >  0-  (1*2) 

In  order  that  (1.2)  be  hyperbolic,  we  require  that  dQ(U)  has  real 
eigenvalues  for  all  U.  Then,  up  to  a  linear  constant  change  of  variables 
In  U,  we  may  take 

Q(U)  -  dC(U)  (1.3) 

where 

C(u,v)  ■  au3/3  +  bu2v  +  uv2  (1.4) 

(see  [14]).  With  this  result,  we  can  classify  variations  in  the 
properties  of  equation  (1.1)  near  an  umbillc  point  in  terns  of  the 
parameters  a,b. 

As  an  Indication  of  the  effect  of  the  umbillc  point,  consider 
rarefaction  curves  for  system  (1.2).  These  are  integral  curves  of  the 
right  eigenvectors  of  dQ(U): 

U-  =  rk(U),  where  dQ(U)rk(U)  =  Ak(U)rk(U),  (1.5) 
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k  ■  1  or  2.  There  are  three  patterns  of  rarefaction  curves,  depending  on 
(a,b) .  One  of  these  patterns  splits  into  two  cases  by  considering 
directions  of  Increasing  characteristic  speed  ^(U)  (Indicated  by  arrows 

In  Figure  2).  The  four  cases  are  Indicated  In  the  (a,b)  plane  In  Figure 
1,  with  the  corresponding  rarefaction  curves  shown  in  Figure  2. 


figure  1-  The  <a,b)  -  plane. 


Im  III  Cm  If 


_  slow  waves 

_  fast  waves 

Figure  2.  The  rarefaction  curves. 
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Rarefaction  curves  give  the  values  of  centered  piecewise  smooth 
solutions  U(x/t),  with  x/t  «  Ak(U) .  These  special  solutions  are  called 

rarefaction  waves.  Haves  associated  with  are  called  slow  waves,  while 

those  associated  with  a2  are  called  fast  waves.  In  13,  we  show  how 

rarefaction  waves  and  shock  waves  are  used  to  solve  certain  Riemann 
problems.  Note  that  the  rarefaction  curves  are  orthogonal  trajectories 
away  from  the  umblllc  point.  The  loss  of  rectangular  geometry  of  the 
rarefaction  curves,  due  to  the  presence  of  the  umblllc  point,  has  a 
profound  effect  upon  the  usual  geometric  construction  of  solutions  of  the 
Riemann  problem.  Note  that  the  characteristic  speeds,  when  restricted  to 
their  corresponding  rarefaction  curves,  can  have  critical  points: 

rk<U)  •  vAk(U)  «  0  (1.6) 

corresponding  to  the  loss  of  genuine  nonlinearity.  We  refer  to  the  lines 
defined  by  (1.6)  as  Inflection  loci.  There  are  three  Inflection  loci  In 
Case  I,  and  there  is  one  inflection  locus  In  Cases  1 1 -IV. 


2j _ Formation  of  shocks.  The  usual  strategy  for  studying  shock 

formation  for  strictly  hyperbolic  2x2  systems  Is  to  use  the  Riemann 
Invariants  to  diagonalize  the  system.  Results  have  been  obtained  even 
when  the  equations  are  not  genuinely  nonlinear  for  all  U  [10,13]. 

Riemann  Invariants  for  equation  (1.2)  are  known  to  exist  only  for 
a  «  -1,  b  «  0.  In  this  case,  we  rewrite  equation  (1.2)  using  complex 
notation  z  *=  u+iv: 


zt  +  (z2^  «  0,  -  «  <  x  <  «,  t  >  0.  (2.1) 

Riemann  Invariants  p,  o  for  (2.1)  are  given  by 

w  -  p  +  io,  w  «  z3/2.  (2.2) 

The  mapping  (2.2)  takes  the  coordinate  system  of  Case  I,  Figure  1  onto  a 
rectangular  grid.  This  corresponds  to  p,  a  being  constant  an  their 
respective  rarefaction  curves,  diagonalizing  (2.1): 

Pt  ”  2|z|px  -  0  (2.3) 

°t  +  2M°x  *  0  (2'4) 

Following  Lax  [12],  we  differentiate  (2.3)  with  respect  to  x  (since 
shock  formation  corresponds  to  jpxJ  — *  •»  or  |ox|  —»•»).  If  we  introduce 

a  new  variable 
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(2.5) 


q 


lz! 


we  find,  after  a  straightforward  calculation,  that  q  satisfies  the 
equation 


a?"  c  |Z|5/3  q2,  (2'6) 

where  d/dt  D  a.  -  2  lz  Id  ,  and  c  >  0  Is  a  constant.  Now  p  *  constant  in 

(2.6),  so  we  easily  read  off  that  q  — *  ±  —  in  finite  time  if  pq  >  0  at  t 
*  0,  l.e.  If  p(x, 0)  p  (x, 0)  >  0  for  sane  x.  Similarly,  if  o(x,0)o  (x,0) 

X  X 

<  0  for  some  x,  then  (2.1)  cannot  have  globally  smooth  solutions,  due  to 
sup|ox(x,t)  |  — ►  in  finite  time. 


These  conditions  for  the  nonexistence  of  globally  smooth  solutions 
have  the  Interpretation  that  the  Initial  data  should  reverse  the 
orientation  of  the  appropriate  rarefaction  curve.  This  Is  precisely  the 
situation  that  guarantees  shock  formation  for  strictly  hyperbolic, 
genuinely  nonlinear  equations.  The  reason  the  same  conditions  apply  here 
is  that,  for  equation  (2.1),  the  rarefaction  curves  of  one  characteristic 
family  do  not  encounter  Inflection  loci  of  the  other  family. 

For  (a,b)  *  (-1,0),  It  is  appropriate  to  use  generalized  Riemann 
Invariants  [8],  in  order  to  get  a  coupled  system  of  Rlcattl  equations, 

i  9 

each  equation  having  the  fora  (2.6).  The  coefficient  of  q  will  not 
however  automatically  have  a  single  sign  for  all  t  >  0,  due  to  the 
crossing  of  rarefaction  curves  and  inflection  loci  of  the  opposite 
characteristic  family.  It  Is  not  known  how  to  describe  the  class  of 
smooth  Initial  data  giving  rise  to  finite  time  shock  formation,  except  in 
the  special  case  a  ■  -1,  b  ■  0  considered  above. 


3.  Solution  of  the  Riemann  problem.  In  this  section,  we  present  some 
features  of  the  Riemann  Initial  value  problem  for  equation  (1.2),  with  Q 
given  by  (1.3),  (1.4).  The  Riemann  problem  consists  of  finding  a 
physical  weak  solution  U(x/t)  of  (1.2)  satisfying  the  Initial  condition 


U(x,0) 


UL  if  x  <  0 
UR  if  x  >  0 


(3.1) 
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The  solution  consists  of  rarefaction  waves  and  shock  waves.  The  former 
are  smooth  functions,  with  U  taking  values  along  one  of  the  rarefaction 
curves,  while  the  shock  waves  are  discontinuous  solutions,  which  we  take 
to  satisfy  the  Lax  admissibility  condition  [11].  Generally,  the  solution 
of  the  Risnann  problem  involves  a  slow  wave  and  a  fast  wave,  separated  by 
a  constant  value  of  U.  Each  wave  may  be  composite,  although  for 
quadratic  rani  inearl  tes  we  have  shown  that  the  only  physical  composite 
waves  are  slow  rarefaction-shocks  and  fast  shock-rarefactions  [15]. 


Figure  3.  (RS)S  solution  of  the  Riemann  problem. 


A  typical  solution  of  the  Riemann  problem  is  shown  in  Figure  3.  The 
solution  consists  of  a  composite  slow  wave  (a  rarefaction-shock,  denoted 
by  RS),  and  a  fast  shock,  derated  by  S.  Me  codify  the  solution  for  these 
values  of  U^,  UR  by  (RS)S.  For  a  fixed  U^,  the  set  of  Up  that  give  rlee 

to  (RS)S  solutions  forms  a  region  in  the  Up  plane.  By  considering  all 

possibly  combinations  of  waves,  for  a  fixed  u^,  we  build  a  picture  of 

regions  in  the  UR  plans.  As  U^  varlss,  these  regions  distort,  and 

coalesce  (for  example  if  the  strength  of  one  of  the  waves  goes  to  zero) . 
We  thus  have  UL  sectors:  for  in  each  sector,  the  pattern  of  Up 

regions  is  qualitatively  the  same. 
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Figure  4.  UL  sectors.  Case  III. 


For  Case  III,  the  sectors  are  shown  in  Figure  4,  and  representative 
UR  diagram s  are  shoan  in  Figure  5.  The  heavy  lines  in  Figure  4  indicate 

valuss  of  UL  for  which  cooperatively  major  changes  occur  in  the  Up 

diagrams,  while  the  fainter  lines  correspond  to  minor  changes  in  the  Up 

diagr  was.  The  Intermediate  state  Uy  between  the  two  waves,  lias  on  one 

of  the  heavy  lines  in  Figure  5,  corresponding  to  slow  waves.  The  knotted 
lines  represent  overccwprsseivs  waves,  in  which  the  two  waves  used  to 
solve  the  Rlamann  problem  touch,  so  that  there  is  no  intermediate  state 
Uj.  Another  role  of  the  knotted  lines  ia  that  the  solution  of  the 

Rlaaam  problem  ia  discontinuous  with  respect  to  UR  across  this  line. 

Specifically,  the  intermediate  state  U.  experiences  a  jump,  from  ana 

section  of  the  heavy  line  to  another  section,  as  UR  crosses  the  knotted 

line.  Note  however,  that  the  solution  is  continuous  in  the  norm,  due 

to  the  touching  of  the  slow  and  fast  waves  in  the  limit  as  UR  approaches 

the  knotted  line. 
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rarefaction-shock 
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Case  III. 


A  detailed  interpretation  of  the  diagrams,  together  with  diagrams 
for  Cases  II  and  IV,  is  given  in  [15].  Ite  main  result  here  is  that  in 
Cases  II-IV,  the  Riemann  problem  for  equation  (1.3)  has  a  unique  physical 
solution  that  can  be  constructed  graphically.  A  computer  program  to 
automate  this  solution  is  being  developed  by  E.  Isaacson,  D.  Marches  in 
and  B.  Plohr.  Our  work  on  these  Rlaannn  problems  involves  a  combination 
of  computer  graphics  and  Mathematical  analysis,  and  owes  much  to  a  study 
of  the  symmetric  cases  (b  ■  0  in  (1.4)),  given  in  [6,7]  (see  also  [17]). 
Case  I  presents  special  problems  because  the  Lax  admissibility  condition 
on  shocks  is  too  restrictive.  As  shown  in  [16],  the  solution  of  the 
Rieasnn  problem  in  Case  I  will  in  general  require  the  admissibility  of 
certain  underoompraaslve  shocks.  It  is  at  present  unknown  how  to 
characterize  these,  except  in  the  special  symnetrlc  case  of  equation 
(2.1),  for  which  the  Riemann  problem  le  solved  in  [16]. 
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ABSTRACT.  The  matrix  discretization  of  boundary  value  problems  that  occur 
in  hydrodynamic  stability  contain  the  frequency  u>  and  the  wavenumber  a  of  the  normal 
modes  as  well  as  other  parameters.  For  most  temporal  stability  calculations  an  algebraic 
eigenvalue  problem  may  be  posed  since  u>  appears  linearly.  Spatial  stability  problems  are 
more  complicated  since  the  eigenvalue  a  appears  nonlinearly.  Problems  of  this  type  are 
examined  in  this  paper.  The  stability  of  a  laminar  boundary  layer  over  a  compliant  wall  is 
considered.  In  this  case  the  wavenumber  appears  to  power  four  in  the  differential  equation, 
the  Orr-Sommerfeld  equation,  and  to  power  five  in  the  wall  boundary  condition.  A  model 
for  the  compliant  surface  is  developed  and  the  differential  problem  is  defined.  The  matrix 
methods  applied  to  the  solution  of  this  problem  are  demonstrated  on  a  model  problem. 
Eigenvalue  spectra  are  calculated  for  the  model  problem  and  the  boundary  layer  stability 
problem.  The  methods  for  obtaining  the  eigenvalue  efficiently  depend  on  the  factorization 
of  matrix  polynomials.  Various  factorization  schemes  are  considered  including  Bernoulli 
and  Traub  iteration  and  Newton’s  method. 


1.  INTRODUCTION.  This  paper  is  concerned  with  the  application  of  matrix 
factorization  techniques  to  problems  in  hydrodynamic  stability.  A  spectral  method  is 
used  to  discretize  the  boundary  value  problem.  Orszag  [l]  used  a  spectral  approach  to 
obtain  the  eigenvalue  spectrum  of  the  Orr-Sommerfeld  equation  for  Poiseuille  flow.  He 
considered  temporal  stability  in  which  the  eigenvalue,  which  is  the  frequency  of  the  normal 
mode,  appears  linearly.  Thus  the  problem  becomes  an  algebraic  eigenvalue  problem  which 
may  be  solved  by  a  number  of  standard  algorithms.  The  spatial  stability  problem  is 
more  complicated  since  the  eigenvalue,  which  in  this  case  is  the  wavenumber  of  the  normal 
mode,  appears  to  power  four  in  the  differential  equation.  However  it  is  this  problem  that  is 
physically  realistic  in  which  fixed  real  frequency  disturbances  amplify  convectively.  For  the 
rigid  wall  boundary  conditions  of  Poiseuille  flow  the  boundary  conditions  are  independent 
of  the  eigenvalue.  In  this  case  the  spectral  discretization  of  the  boundary  value  problem 
yields  an  eigenvalue  problem  of  the  form: 

[£A*a4-*]  a  =  0  (1) 

fc=0 

where  a  is  the  eigenvector  of  Chcbyshev  coefficients  and  a  is  the  wavenumber.  To  recast 
this  problem  as  an  algebraic  eigenvalue  problem  Benney  and  Orszag  [2]  used  the  Compan¬ 
ion  Matrix  Method  which  is  described  briefly  below.  This  approach  yields  matrices  which 
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are  four  times  the  size  of  the  original  matrices  A,.  Since  the  operation  count  of  standard 
eigenvalue  algorithms,  such  as  the  QR  algorithm,  are  of  the  order  of  N3,  this  approach  is 
computationally  expensive.  For  this  reason  Benney  and  Orszag  [2]  chose  to  solve  the  tem¬ 
poral  eigenvalue  problem  tmd  then  used  transformations,  which  are  valid  for  small  growth 
rates  to  convert  to  the  spatial  stability  case.  It  should  be  emphasized  that  if  a  local  itera¬ 
tion  method  is  used  to  find  the  eigenvalue  the  spatial  problem  is  no  more  complicated  than 
the  temporal  problem.  However  a  good  first  approximation  for  the  eigenvalue  is  required 
and  there  is  no  guarantee  that  all  unstable  or  critical  eigenvalues  will  be  found.  Bridges 
and  Morris  [3]  applied  methods  based  on  matrix  factorization  to  the  eigenvalue  problem  in 
Eq.  (1).  This  technique  converts  the  problem  in  which  the  eigenvalue  appears  nonlinearly 
to  one  in  which  it  appears  linearly  and  is  readily  obtained  by  standard  algorithms.  The 
resulting  algebraic  eigenvalue  problem  is  of  the  same  size  as  the  original  matrices  so  that 
this  technique  is  much  more  efficient  than  the  Companion  Matrix  Method.  The  eigenvalues 
yielded  by  this  approach  represent  a  subset  of  the  eigenvalues  of  the  entire  problem  Eq. 
(1).  They  may  be  the  subset  of  eigenvalues  with  either  the  greatest  or  smallest  absolute 
values.  Bridges  and  Morris  [4]  examined  the  stability  of  the  Blasius  boundary  layer  with 
this  technique.  However  the  oniy  successful  globally  convergent  scheme  for  this  problem 
was  found  to  be  the  Companion  Matrix  Method.  The  reasons  for  this  are  discussed  in  this 
paper  and  a  successful  application  of  the  matrix  factorization  scheme  is  provided.  Carpen¬ 
ter  and  Morris  [5]  applied  the  matrix  factorization  scheme  to  the  problem  of  the  stability 
of  a  laminar  boundary  layer  over  a  non-isotropic  compliant  surface.  Using  the  matrix  fac¬ 
torization  scheme  they  were  able  to  identify  various  modes  of  instability  simultaneously. 
However,  it  should  be  noted  that  the  accuracy  of  any  eigenvalue  is  improved  if  its  location 
is  known  approximately  in  the  complex  plane. 

In  this  ^aper  the  formulation  of  the  boundary  layer  stability  problem  over  a  compliant 
surface  is  reformulated.  In  the  new  form  there  is  no  restriction  on  the  degree  or  nature  of 
non-isotropy  of  the  compliant  surface.  However,  the  major  emphasis  of  this  paper  is  not 
this  partrular  problem,  but  general  problems  of  the  same  type.  The  interesting  feature 
of  the  compliant  wall  stability  problem  is  that  the  eigenvalue  appears  to  a  higher  power 
in  the  boundary  conditions  than  in  the  differential  equation.  Also  that  the  domain  of  the 
independent  variable  is  unbounded  so  that  the  problem  exhibits  a  continuous  as  well  as  a 
discrete  spectrum. 

In  the  subsequent  sections  the  boundary  valuv  oroblem  for  the  stability  of  a  laminar 
boundary  layer  on  a  non-isotropic  compliant  surface  will  be  developed.  A  model  problem 
with  many  of  the  features  of  the  real  problem  will  be  introduced.  This  problem  is  solved 
by  various  methods  including  the  Companion  Matrix  Method  and  by  matrix  factorization. 
Various  schemes  for  the  factorization  of  matrix  polynomials  are  examined.  Calculations  of 
the  eigenvalue  spectrum  and  its  subset,  obtained  from  the  matrix  factorization  approach, 
are  given  for  both  the  model  problem  and  the  compliant  boundary  layer  stability  problem. 


2.  PROBLEM  FORMULATION.  The  efficiency  and  quietness  of  underwater 
vehicles  is  affected  by  the  nature  of  their  boundary  layers:  a  fully-iaminar  boundary  layer 
providing  the  least  drag  and  noise.  A  passive  method  for  delaying  boundary  layer  transition 
to  turbulence  involves  the  use  of  a  compliant  surface.  Early  experiments  by  Kramer  (6,7) 
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indicated  a  significant  reduction  in  the  drag  on  a  vehicle  with  a  compliant  coating  but  until 
recently  experiments  had  failed  to  reproduce  these  results.  However  Caster  and  Daniel  (8) 
showed  that  the  growth  of  Tollmien-Schlichting  instabilities  can  be  reduced  dramatically 
by  an  appropriate  choice  of  compliant  surface.  The  compliant  surface  used  was  a  silicone 
rubber-based  substrate  with  a  latex  skin.  Since  such  a  simple  surface  provided  a  reduction 
in  the  growth  of  instabilities  and  gave  good  agreement  with  the  predictions  of  linear  theory 
it  is  reasonable  to  examine  other  surfaces  theoretically  that  could  give  further  reduction 
in  the  wave  growth. 

The  production  rate  of  fluctuation  energy,  whether  in  the  early  stages  of  transition 
or  for  a  turbulent  flow,  depends  on  the  product  of  the  Reynolds  stress  and  the  strain 
rate  of  the  basic  flow.  If  these  quantities  have  unequal  signs  there  is  production  and 
if  they  have  equal  signs  there  is  negative  production  or  decay  of  the  unsteadiness.  In  a 
boundary  layer  production  occurs  close  to  the  wall.  Grosskreutz  [9]  proposed  a  nonisotropic 
compliant  surface  that  would  force  the  production  at  the  wall  to  be  negative.  Some  stability 
calculations  for  a  model  of  this  surface  were  performed  by  Carpenter  and  Morris  [5].  A 
revised  formulation  of  this  problem  forms  the  boundary  value  problem  discussed  below. 

A  simple  model  for  the  surface  is  shown  in  Fig.  1.  The  nondimensional  displacements 
of  the  surface  tj  and  £  in  the  normal  and  streamwise  directions  respectively  are  related  to 
the  angular  displacement  of  the  swivel  arms  68,  by 


—  t  60  sin0  and  tj6*  —  i  60  cosO. 


(2) 


where  6*  is  the  displacement  thickness  of  the  boundary  layer.  These  relationships  show 
t..at  the  production  term  will  be  negative  if  the  the  swivel  arms  are  directed  towards  the 
flow  direction  and  positive  if  the  arms  point  downstream.  The  equation  of  motion  for  an 
element  of  the  surface  in  the  direction  normal  to  the  swivel  arm  may  be  written 


Pmb 


d3(C60) 
dt 2 


d*n  d2t 

=  -B—j  cos  0  -  K160  +  Eb-  \  sin  0 
ox*  ox* 

—  p0  cos  0  +  <7q  cos  0  +  tq  sin  0\ 


(3) 


x  and  y  are  the  coordinates  in  and  normal  to  the  streamwise  direction;  pm  and  b  are  the 
density  and  thickness  of  the  plate;  po,  °o  and  rc  are  the  pressure  and  the  normal  and  shear 
viscous  stresses  at  the  wall;/?  and  E  are  the  flexural  rigidity  and  elastic  modulus  of  the 
plate;  and  K  is  the  spring  stiffness.  Let  the  velocity  fluctuations  in  the  (x,y)  directions  be 
(u,v)  and  seek  a  solution  for  the  surface  displacement  in  the  form: 

rj  =  r\6*  exp|i(o!X  -  (4) 

Continuity  of  normal  and  tangential  motion  at  the  wall  then  yields: 

G)r\  =  tt)(0)  (5) 

and 

-ifiwsin^v(O)  =  acosflt/^Oji^O)  -f  wcosflv^O).  (6) 
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U  is  the  mean  velocity  of  the  boundary  layer,  primes  denote  differentiation  with  respect 
to  y  and  all  quantities  in  eqns.  (5)  and  (6)  have  been  nondimensionalized  with  respect  to 
the  freestream  velocity  Uoo  and  the  displacement  thickness  6*.  The  fluctuating  stresses  at 
the  wall  may  be  related  to  the  normal  velocity  fluctuation  in  the  fluid  using  the  linearized 
continuity  and  momentum  equations.  If  ^  =  O'  then  r?  and  (  may  be  eliminated  and  the 
wall  boundary  conditions  may  be  written  in  terms  of  f  and  v  alone: 


r  Cb 
i  CM 

cos2  0f( 0)]  -f  a3  sin2  0?(O)] 

+  a2 

^(2u>sin0  —  3*C/'(0)  cos 

•OH 

+  a[ 

*U'{ 0)  +  *w  sin  9)  C^R  ?'(0)] 

-  [•( 

cos  6U'{  0)  +  iu  sin  0)  — ^~fw(0) 

CmR 

—  sinflcosl?f(0)j 

=  0, 

(7) 

and 

a  [cos  8U'( 0)  -f  iuf  sin  0]  0(0)  +  &  cos  (0)  =  0.  (8) 

where 


r  -£lA  n  B  r  -  KS*  An-  Eb 

M  poC’  *  PaVi.f’  K  p0v >,  “  Ct  potr&f 

R  is  the  Reynolds  number / u.  It  should  be  noted  that  &  appears  to  power  five  in 
eqn.  (7).  In  ref.  [5]  a  different  form  of  the  condition  contained  &  to  power  six.  Also  eqn. 
(7)  is  valid  for  all  values  of  9.  The  velocity  fluctuations  in  the  boundary  layer  satisfy  the 
Orr-Sommerfeld  equation  which  may  be  written  in  terms  of  0  and  f  as, 

+  *(»)*'  +  A(y)0  =  0, 

0'  -  f  =  0, 


where 

A(y)  —  -iR{aU  -  w)  -  2a2 

and 

fl(y)  =  iR{aU  -  u)&2  +  iGtRU"  +  a4. 

In  addition  the  fluctuations  are  required  to  vanish  at  infinity: 

0(y)  =  v'(y)  0  as  y  -+  oo  (10) 

In  order  to  demonstrate  the  numerical  methods  without  the  complexity  of  the  algebra 
involved  in  the  problem  given  by  eqns.  (7)-(10)  a  model  problem  will  be  introduced. 
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Though  this  problem  does  not  have  the  stiffness  of  the  Orr-Somrncrfeld  equation  it  does 
have  the  correct  nonlinearity  of  the  eigenvalue  in  the  equation  and  boundary  conditions. 
Consider  the  model  problem: 


d26  dd>  0  , 

■j-z  -  2aijj~  +  a2<f>  =  0  zG[-l,l] 

dxA  ax 

(11) 

<t>{  1)  =  0 

(12.0) 

«8*(-i)  +  5~(-l)  =  ° 

(12.6) 

The  exact  solution  to  this  problem  is  given  by, 

<l>(x)  =  j4exp(au>x)  [sin  -7  cos  71  -  cos  7  sin r/x], 

(13) 

where 

*1  =  Ot\/ [  -  CJ2, 

with 

tan[2a\/l  —  w2]  =  \/l  —  w2/(w  +  a2). 

(14) 

<j>{x)  is  approximated  by  a  finite  series  of  Chebyshev  polynomials: 

N 

M  arTrix)' 

(15) 

r=0 


The  formulae  for  the  integrals  of  Chebyshev  polynomials  are  much  simpler  than  those 
for  the  derivatives.  Thus  eqn.(ll)  is  first  integrated  twice  indefinitely  with  respect  to 
x.  Prior  to  substitution  of  the  series  approximation  this  equation  is  perturbed  by  two 
additional  Chebyshev  polynomials  to  prevent  a  trivial  solution  (see  ref.  [10]).  Thus  the 
actual  equation  solved  is, 


^(x)  —  2  aw 


<t> 


=  C \x  +  C  2  +  tn+iTn+\(x)  +  rjv+2Tjy+2(z). 


(16) 


When  the  series  (15)  is  substituted  into  eqn.  (16),  and  the  boundary  conditions  (12)  and 
the  coefficients  of  equal  orders  of  Chebyshev  coefficient  are  set  to  zero  a  matrix  eigenvalue 
problem  is  obtained. 

{C0a3  +  Cia2  +  C3a  +  C3}  a  =  0.  (17) 

a  is  the  vector  of  unknown  Chebyshev  coefficients.  The  equations  involving  the  zero-th  and 
first  order  Chebyshev  polynomials,  which  would  involve  the  unknown  integration  constants 
Ci  and  C2,  are  replaced  by  the  series  approximation  to  the  boundary  conditions  in  rows 
N  and  N  + 1  of  the  matrix  equation.  It  should  be  noted  that  the  leading  coefficient  matrix 
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Co  is  singular  since  the  only  terms  invoving  a3  occur  in  one  boundary  condition.  Thus 
the  elements  of  Co  may  be  written, 


/  0  0 

/  0  0 


Ao  = 


0 


0 


I  CJV.O  CN,  1 

Vo  o 


CN,N+l 

0  J 


(18) 


In  the  next  section  several  procedures  for  solving  the  matrix  eigenvalue  problem  (17)  will 
be  described. 


3.  NUMERICAL  METHODS.  The  Companion  Matrix  Method  that  was  used 
by  Benney  and  Orszag  [2]  involves  the  definition  of  two  new  vectors, 

aj  =  aa  and  a2  =  aaj.  (19) 

With  these  definitions  the  matrix  eigenvalue  problem  may  be  written  in  block  matrix  form, 

^a"j=0.  (20) 

Since  Co  is  singular  this  cannot  be  changed  to  an  algebraic  eigenvalue  problem  without 
first  introducing  a  transformation, 


/C,  C2  C3\  /-Co  0  0  \ 

I  0  0  -  a  0  10 
V  0  I  0/  \  0  01/ 


A  =  l/(a  -  s). 


(21) 


Then  the  problem  is  readily  written  as, 


(22) 


Since  the  dimension  of  the  block  matrix  is  3(N+l)  x3(N+l)  the  computation  time  required 
to  find  the  eigenvalue  spectrum  is  increased  by  a  factor  27  over  the  linear  problem.  However 
eqn.  (17)  may  be  factorized  so  that  only  a  specific  subset  of  the  eigenvalue  spectrum  is 
calculated. 

Let  eqn. (17),  after  the  use  of  the  transformation  (21),  be  written, 

{D3(A)}  a  =  0.  (23) 


If  the  matrix  equivalent  of  synthetic  division  is  employed  on  eqn. 
form  of  D3  is, 


Da(A)  =  {Q3(A)}(AI-Y). 


(23),  then  the  factored 
(24) 
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where  Y  is  a  solvent  or  factor  of  D3  and  Q2  is  quadratic  in  A.  It  is  readily  shown  that 
eqn.  (24)  will  only  be  satisfied  if  Y  is  a  root  of  the  matrix  polynomial , 

Y3  +  A,Y2  1-A2Y  +  A3  =0.  (25) 

Thus  the  eigenvalue  problem  reduces  first  to  finding  the  roots  of  this  matrix  polynomial. 

The  method  used  by  Bridges  and  Morris  [3]  that  was  proposed  by  Gohberg  et  al  [11  j 
involves  the  use  of  Bernoulli  iteration.  This  method  which  is  an  extension  of  the  standard 
algorithm  for  scalar  polynomials  consists  of  the  iterative  sequence: 


X,+i  ■+•  AjX,  -+•  A2X,_i  +  A3X,_2  —  0,  (26) 

with 

X0  =  X!  =  0  and  X2  =  I.  (27) 

Then, 

lim  XnlXn-,]-1  =S,.  (28) 

n— *oo 

Si  is  the  dominant  solvent  of  D3,  that  is,  the  solvent  that  contains  the  eigenvalues  with  the 
maximum  modulus.  The  convergence  of  this  algorithm  is  slow,  though  it  can  be  improved 
dramatically  if  an  appropriate  choice  is  made  for  the  factor  s  in  eqn.  (21). 

It  is  reasonable  to  seek  quadratically  convergent  schemes  to  find  the  roots  of  the 
matrix  polynomial.  However  such  standard  scalar  algorithms  as  Newton’s  method  are  not 
readily  extended  to  the  matrix  polynomial.  Consider  the  matrix  polynomial, 

Y2  +  C|Y  +  C2  -  0.  (29) 


If  an  iteration  sequence  is  developed  of  the  form, 

Yi+i  =  Y,  +  A  „  (30) 

then  it  is  readily  shown  that  A,  satisfies  an  equation  of  the  form, 

A,  A,  +  A,B,  =  C,.  (31) 

Bartels  and  Stewart  [12]  developed  an  algorithm  to  solve  for  A<  in  0(N3)  operations  by 
triangularizing  the  matrices  A,-  and  B,.  A  more  efficient  scheme  was  developed  by  Golub 
et  al  [13]  still  requiring  0(N3)  operations.  However  if  a  higher  order  matrix  polynomial  is 
considered  such  as  given  by  eqn.  (25)  then  the  equation  for  A,  is,, 

At  At  +  B,A,Cj  +  A,D,  —  E(.  (32) 

There  does  not  appear  to  be  a  particular  algorithm  for  solving  this  equation.  Clearly  higher 
order  matrix  polynomials  will  lead  to  more  complicated  equations.  However  it  is  always 
possible  to  construct  a  system  of  equations  for  the  N*  unknown  elements  of  At.  Since  A, 
is  only  a  small  correction  in  the  Newton’s  method  it  should  not  have  to  be  evaluated  with 
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a  high  degree  of  accuracy.  Thus  with  suitable  preconditioning  an  iterative  solution  of  eqn. 
(32)  might  not  be  too  lengthy.  This  possibility  is  being  considered  by  the  author. 

The  last  algorithm  to  be  considered  is  that  developed  by  Dennis  et  al,  [14].  This  is 
a  two-stage  algorithm  based  on  the  algorithm  by  Traub  for  scalar  polynomials,  ref.  [15]. 
The  algorithm  consists  of  the  construction  of  the  equivalent  of  the  G-polynomials, 

Go(Y)  =  I 


G„+,(Y)  =  G„(Y)Y  -  r<n)D3(Y) 

(33) 

where 

Gn(Y)  =  r^Y2  -f-  r^n)Y  +  r|n). 

The  second  stage  of  the  algorithm  consists  of  constructing  the  iterative  sequence, 

(34) 

Yo  =  (rS'-'Hr!'-11)-1. 

(35a) 

and, 

Y.+1  =  Gt(Y,)Gti,(Y.). 

(?») 

The  first  stage  of  the  algorithm  with  Y  given  by  eqn.  (35  a)  is  equivalent  to  Bernoulli 
iteration.  The  use  of  the  second  stage  of  the  algorithm  does  not  change  the  linear  conver¬ 
gence  of  the  iteration  but  the  asymptotic  error  constant  may  be  made  as  small  as  desired 
by  increasing  the  number  of  first  stage  iterations.  It  would  appear  to  be  very  desirable 
to  extend  the  iterative  schemes  based  on  the  generalized  G-polynomials  of  ref.  [15]  to  the 
matrix  case.  However  this  extension  would  require  properties  of  matrix  derivatives  that 
do  not  appear  to  be  available. 

In  the  next  section  some  numerical  examples  of  the  application  of  these  algorithms 
will  be  given. 

4.  CALCULATIONS.  First  the  model  problem  given  by  eqns.  (ll)  and  (12)  will 
be  considered.  Table  1  shows  the  eigenvalue  spectrum  given  by  the  Companion  Matrix 
Method  with  A  =  0.5  and  N  =  12.  It  can  be  seen  that  the  spectrum  contains  N  —  1 
“infinite”  eigenvalues.  This  corresponds  to  the  fact  that  the  leading  coefficient  matrix  Co 
has  rank  unity.  The  corresponding  behavior  for  scalar  polynomials  is  given  by  an  “infinite” 
root  when  the  leading  coefficient  of  the  polynomial  tends  to  zero.  The  roots  on  the  real 
axis  are  close,  to  multiples  of  n  as  could  be  inferred  from  the  eigenvalue  relationship  given 
by  eqn.  (14).  The  roots  away  from  the  real  axis  occur  in  complex  conjugate  pairs. 

Figure  2  shows  the  finite  roots  given  in  Table  I  as  well  as  the  roots  obtained  using 
Traub  iteration.  The  number  of  first  and  second  stage  iterations  was  10  and  5  respectively. 
The  four  eigenvalues  closest  to  the  value  of  the  shift  s  in  eqn.  (21),  which  was  0.5,  are 
very  accurately  obtained.  However  the  remaining  eight  eigenvalues  do  not  correspond  to 
the  values  given  by  the  Companion  Matrix  Method.  The  reason  for  this  is  unclear  though 
the  occurrence  of  the  complex  conjugates  in  this  problem  suggests  that  a  dominant  solvent 
may  not  exist.  However  shifting  the  value  of  s  to  a  complex  value  did  not  alter  the  result. 
Thus  the  reason  for  the  failure  of  the  factorization  scheme  in  this  case  remains  unclear. 
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In  spite  of  the  difficulties  with  the  model  problem  it  has  served  to  illustrate  the 
numerical  methods.  These  methods  have  been  applied  t,o  the  more  complex  problem 
posed  by  the  Orr-Sommerfeld  equation  and  the  boundary  conditions  corresponding  to  a 
non-isotropic  compliant  surface.  Rather  than  detail  the  hydrodynamic  properties  of  such  a 
surface  a  special  case  will  be  considered.  It  was  mentioned  earlier  that  Bridges  and  Morris 
[5]  had  difficulty  in  applying  matrix  factorization  techniques  for  the  rigid  wall  boundary 
layer.  The  reason  for  this  can  be  seen  if  the  Companion  Matrix  Method  is  applied  to  the 
boundary  value  problem  given  in  Section  2  for  the  case  of  a  massive  wall;  that  is  as  Cm  —* 
oo.  In  this  case  the  compliant  surface  problem  reduces  to  the  rigid  wall  case.  Figure  3  shows 
the  resulting  eigenvalue  spectrum  for  N  =  24,  R  —  2240  and  w  =  0.05.  This  case  gives  96 
finite  eigenvalues.  The  eigenvalues  shown  in  Fig.  3  contain  both  discrete  eigenvalues  and 
the  Chebyshev  approximation  to  the  four  branches  of  the  continuous  spectrum  (see  ref  [l6]). 
It  can  be  seen  that  many  of  the  eigenvalues  are  clustered  around  a  =  0.  If  no  shift  in  the 
eigenvalue,  as  given  by  eqn.  ('  is  used  then  the  eigenvalues  of  the  minimal  solvent  will 
not  include  the  discrete  eigenvalue  close  toa  =  0.3.  The  minimal  solvent  was  sought  in  ref. 
[4]  and  the  discrete  eigenvalue  could  not  be  obtained.  The  same  spectrum  of  eigenvalues 
is  shown  in  the  c  —  plane  in  Fig.  4,  where  c  =  u/a.  The  Tollmien-Schlrhting  instability 
is  indicated.  One  branch  of  the  continuous  spectrum  forms  a  semi-circle  in  the  c  —  plane 
between  c  =  0  and  e  =  1.  The  attempt  by  the  finite  Chebyshev  series  to  approximate  this 
branch  is  seen  clearly  in  this  figure.  If  the  eigenvalue  problem  is  shifted  by  s  =  0.3  then 
Traub  iteration  gives  the  spectrum  shown  if  Fig.  5.  The  dominant  eigenvalues  have  been 
sought  using  the  iteration  scheme  given  in  Section  3.  All  of  the  eigenvalues  associated 
with  the  approximation  to  the  continuous  spectrum  that  gave  values  of  c  close  to  zero 
have  been  eliminated.  The  spectrum  given  in  Fig.  5  was  obtained  with  5  first  stage  and 
5  second  stage  iterations.  No  accurate  computation  times  were  obtained  but  the  matrix 
factorization  scheme  was  considerably  faster  than  the  Companion  Matrix  Method. 

In  this  section  several  examples  of  the  application  of  matrix  factorization  schemes 
have  been  given.  It  is  clear  that  they  offer  a  considerable  advantage  over  other  schemes  in 
the  solution  of  eigenvalue  problems  in  which  the  eigenvalue  appears  nonlinearly.  It  is  also 
clear  that  methods  for  factorizing  matrix  polynomials  that  have  high  rates  of  convergence 
are  still  needed. 
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®re«l 

&imag 

0.44067  ISE-t-  00 
-.2992489E-MK) 
-.2992489E+00 
0.3186915E+01 
-3093725E+01 
0.6838718E-I-01 
0.6838718E+01 
0.6186921E+01 
0.6912758E+01 
0.6912758E+01 
-.6061245E+01 
0.6813279E+01 
0.6813279E+01 

0. 1049901 E +02 
-6727921E+01 
-.6727921 E+01 
-.6934206E+01 
-6934206E+01 
-.6876758E+01 
-.6876758E+01 
-.1046709E+02 
0.9339952E+01 
0.9339952E+01 
-.9412304E+01 
-.9412304E+01 
0.1471219E+18 
0.2924940E+16 
0.1196801E+16 
0.2813020E+16 
-3321595E+16 
-.2542934E+17 
0.2955202E+1G 
0.1349990E+17 
-.2681021 E+ 16 
-.3430352  E-t  16 
-4477007E  +  16 

0.4510956E— 14 
0.1022875E+01 
-.1022875E+01 
-  2216713E— 08 
0.1475552E— 08 
0.1359535E+01 
.  1359535E  |  01 
0.2801617E-08 
0.387824 1E+01 
-.3878241E+01 
0.1430764E— 07 
0.7021262E+01 
-.7021262E+01 
0.4074714E— 07 
0.1523704E+01 
-.1523704E+01 
0.4003945E+01 
-.4003945E+01 
0.7071 108E+01 
-.7071 108E+01 
0.5751595E-09 
0.1092799E+02 
-.1092799E+02 
0.1088636E+02 
-.1088636E+02 
O.OOOOOOOE+OO 
0.0000000E+00 
O.OOOOOOOE+OO 
O.OOOOOOOE+OO 
O.OOOOOOOE+OO 
O.OOOOOOOE+OO 
O.OOOOOOOE+OO 
O.OOOOOOOE+OO 
O.OOOOOOOE+OO 
O.OOOOOOOE+OO 
O.OOOOOOOE+OO 

Table  I  Eigenvalues  of  model  problem 
=  y/Z/2,  N  =  12,  tan  (a)  =  l/(y/3  +  2a2) 
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ABSTRACT 

Two  methods  for  computing  the  effect  of  curvature  on  the  speed  of  finite 
reaction  rate  detonations  are  studied.  One  method  involves  fine  grid  compu¬ 
tations  using  a  method  which  gives  a  solution  of  high  quality  and  is  taker  as 
exact.  The  other  is  based  on  recent  work  by  one  of  the  authors  (J.J.)  in 
which  an  asymptotic  model  for  expanding  detonation  waves  is  presented  and 
analyzed.  Both  methods  assume  cylindrical  geometry. 

The  asymptotic  model  consists  of  a  pair  of  quasi  steady  state  ordinary  dif¬ 
ferential  equations  for  the  flow  velocity  and  a  reaction  progress  variable.  The 
equations  are  correct  for  large  times  and  large  radii.  For  each  value  of  the 
shock  radius,  the  speed  of  the  weak  detonation  is  well  defined  as  the  solution 
of  a  shooting  problem  between  the  shock  and  a  critical  point  in  the  phase 
plane. 

In  this  report  we  discuss  the  computational  problems  involved  in  applying 
these  methods.  We  further  show  numerically  that  the  model  equations  are 
accurate  to  first  order  in  powers  of  the  inverse  radius.  Finally,  we  discuss 
how  this  new  theory  may  be  used  in  conjunction  with  the  method  of  front 
tracking  to  numerically  solve  detonation  problems  in  which  weak  detonations 
develop  due  to  the  curvature  of  the  geometry. 


I.  Introduction 

We  study  the  influence  of  the  radius  of  curvature  on  the  speed  of  a  cy  1  ind rically  expand¬ 
ing  detonation  wave  with  finite  reaction  rate.  In  doing  so,  we  also  study  the  transition  from 
strong  to  weak  detonations  in  an  expanding  geometry.  Finally,  we  study  the  effect  of  curva¬ 
ture  on  the  reaction  zone.  The  central  issue  to  be  analyzed  is  the  consequence  of  radially 
induced  cooling  on  the  chemical  reaction.  See  [7|  for  a  review  of  this  topic  and  more  gen¬ 
erally  of  the  theory  of  detonations  in  the  presence  of  endothermic  effects. 

Two  numerical  methods  are  used  to  solve  this  probjem.  First,  a  one  dimensional  ran¬ 
dom  choice  computation  with  operator  splitting  for  both  the  radial  effects  and  the  effects  of 
the  finite  reaction  rate  is  employed.  Since  this  method  resolves  the  reaction  zone  numerically 
(in  contrast  to  [2]),  it  includes  curvature  effects  on  the  detonation  velocity.  This  method 
gives  an  accurate  solution  for  grids  fine  enough  to  capture  the  dynamics  within  the  reaction 

1.  Supported  in  part  by  the  Applied  Mathematical  Sciences  subprogram  of  the  Office  of  Energy 
Research,  U.  S.  Department  of  Energy,  under  contract  DE-AC02-76ER03077. 

2.  Supported  in  part  by  the  Army  Research  Office,  grant  DAAG29-83-K0188. 
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zone.  Next,  we  discuss  the  application  of  recent  contributions  by  J.  Jones  [10],  who  derived 
a  system  of  quasi  steady  state  ordinary  differential  equations  to  describe  expanding  detona¬ 
tions.  We  solve  these  equations  numerically  to  find  the  wave  speed  and  to  resolve  the  reac¬ 
tion  zone.  The  solutions  of  Jones’  equations  are  found  to  be  correct  to  first  order  in  powers 
of  inverse  radius  thereby  confirming  and  validating  his  analysis. 

A  motivation  for  this  work  was  to  enhance  the  front  tracking  algorithm  (see  (3|)  to 
allow  calculation  of  curvilinear  detonation  fronts  in  their  transition  from  strong  to  weak  deto¬ 
nations. 

2.  The  Random  Choke  Computation 

In  this  section,  we  discuss  the  solution  to  the  equations  of  reactive  gas  dynamics  with 
finite  reaction  rates  in  a  symmetric  geometry  using  the  random  choice  method.  The 
Zeld’ovich-Von  Neumann-Doering  (ZND)  model  of  detonations  (see  [7|)  is  used.  For  this 
model,  the  equations  of  inviscid  gas  dynamics  with  cylindrical  symmetry  become 

(2.1)  wt  +  f(w)f  ■*  C-aC  , 

where 
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0  for  planar  geometry 

1  for  cylindrical  geometry  . 

2  for  spherical  geometry 


C  and  aG  are  respectively,  the  source  terms  due  to  combustion  and  geometry.  In  these 
equations,  p  is  the  density  of  the  gar,  m  is  the  momentum  density.  P  is  the  pressure  and  X  is 
the  mass  fraction  of  burned  gas  (0  ss  X  s  1).  The  energy  per  unit  volume,  e,  may  be  written 
as 


e 


=  p«  + 


£1 ii 

2 


where  u  is  the  velocity  and  c  is  the  specific  internal  energy.  Assuming  a  polytropic  equation 
of  state, 
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c 


(i-M*  . 


— g—  + 
p(>-D 

with  7  >  1.  In  order  to  simplify  the  formulas,  the  poiytropic  constant,  -y  ,  is  assumed  to 
have  the  same  value  in  the  unburned,  burned  and  reacting  gas.  The  heat  released  during 
combustion  is  q\  T  is  the  temperature  (T=P/ p)  and  R(\,T)  is  the  reaction  rate.  We  use 
Arrhenius  kinetics,  which  yields  an  infinite  reaction  length.  Thus, 


*(X,T) 


'*(1  -  X)exp[-|-J  ,rar( 

0  ,  T  <  Tc 


where  k  is  the  rate  multiplier,  and  E  is  the  activation  energy.  We  introduce  7\  ,  the  critical 
temperature  below  which  the  reaction  rate  is  taken  to  be  identically  zero,  in  order  to  allow 
for  quenching  and  to  eliminate  the  cold  boundary  effect.  That  is,  if  there  were  no  critical 
temperature,  the  reaction  rate  would  be  positive  even  for  cold  gases.  Then,  the  unburned  gas 
would  begin  to  burn  before  the  shock  wave  encountered  it. 

To  solve  this  system  numerically,  we  employ  operator  splitting  [13].  At  the  start  of  a 
time  step,  we  solve  the  homogeneous  system 


(2.2)  wt  +  f(w)r  =  0 

by  the  random  choice  method  [9], (4).  The  Newton’s  method  of  [2|  is  employed  to  solve  the 
Riemann  problems  that  arise  in  this  computation.  Next,  we  use  the  solution  of  eq.  (2.2)  as 
initial  data  for  the  system  of  ordinary  differential  equations  for  the  geometrical  source  terms. 


(2.3)  wt  =  -aC  , 

Finally,  we  use  the  solution  of  eq.  (2.3)  as  initial  data  to  solve 


the  equation  for  the  source  term  due  to  chemistry.  This  sequential  operator  splitting  calcula¬ 
tion  converges  under  mesh  refinement.  Colella,  Majda  and  Roytburd  [5|  have  used  a  three 
part  splitting  in  their  fractional  step  method  computations  for  reacting  gases. 

A  plot  of  pressure  vs.  distance  for  a  stable  planar  reaction  is  shown  in  Fig.  2a  at  the 
start  of  a  calculation,  initialized  with  the  steady  state  solution,  and  after  several  hundred  time 
steps  using  the  method  described  above.  In  addition,  reactions  which  have  parameters 
chosen  to  yield  unstable  detonations  are  modelled  well  by  this  method.  In  an  example  of  an 
unstable  detonation,  our  results  agree  with  those  of  Erpenbeck  [6]  ,  Mader  [11]  and  Fickett 
and  Wood  [8]  (see  Fig.  2b). 

We  note  that  when  performing  these  calculations  one  must  take  care  not  to  introduce 
spurious  effects  due  to  the  numerical  modelling  techniques.  One  should  include  enough  grid 
points  in  the  region  of  chemical  activity  in  the  reaction  zone.  Also,  as  the  computation 
progresses,  the  region  of  chemical  combustion  grows  for  the  Arrhenius  model  of  kinetics. 
To  deal  with  this  problem,  we  eliminate  the  portion  of  the  computational  region  more  than  a 
certain  distance  behind  the  initiating  shock  wave.  In  doing  so,  care  must  he  taken  to  elim¬ 
inate  only  regions  in  which  there  are  very  small  variations  in  the  gas  states.  Introducing  even 
weak  waves  into  the  icacting  gas  in  this  elimination  process  is  equivalent  to  releasing  smai< 
amounts  of  energy  on  a  slow  time  scale  and  cun  cause  large  errors.  This  phenomenon  was 
studied  by  Bdzil  [1]. 
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3.  The  Asymptotic  Method  of  Jonee 

In  [10],  J.  Jones  derived  and  analyzed  a  method  of  calculating  the  effect  of  curvature 
on  the  speed  of  an  expanding  cylindrical  or  spherical  detonation  wave  to  first  order  in  powers 
of  the  inverse  of  radius  of  curvature.  This  theory  also  predicts  the  state  of  the  gas  through 
the  reacting  region. 

The  radius  of  curvature  of  the  detonation  wave  is  assumed  to  be  much  larger  than  the 
length  of  the  reaction  zone,  where  the  reaction  zone  length  is  taken  to  be  the  distance  from 
the  initiating  shock  wave  to  the  point  at  which  90%  of  the  gas  is  burned.  It  is  also  assumed 
that  the  reaction  has  proceeded  for  many  reaction  zone  lengths  so  that  initial  transients  are 
eliminated.  Thus  the  run  has  settled  down  to  a  quasi  steady  state.  Further,  the  state  of  the 
unburned  gas  ahead  of  the  shock  K  constant  with  zero  velocity.  Through  the  methods  of  per¬ 
turbation  theory,  eliminating  higher  order  terms,  Jones  derived  the  following  system  of  ordi¬ 
nary  differential  equations  from  eq.  (2.1): 


(3.1) 


q(l  -  l)t(l  -  X)  exp 


_lX  _  “£L 


(z  -  u)2  -  c1 


x  =  iihll 

r  —  k 


C1  =  c\  +  1  (r  -  (i  -  u)')  +  q(y  -  1)X 

where  ca  represents  the  sound  speed  in  the  unburned  gas  ahead  of  the  shock,  c  is  the  speed 

of  sound  of  the  reacting  gas,  c  =  j-y/Vp  j  ,  x  is  the  distance  behind  the  initiating  shock  wave, 

z  is  the  radius  of  curvature  of  the  shock,  and  i  is  the  wave  speed. 

In  the  case  of  an  undriven  planar  detonation,  the  reaction  terminates  at  the  Chapman- 
Jouguet  (CJ)  point  on  the  Hugoniot  curve.  This  is  a  sonic  point.  That  is,  a  point  at  which  the 
wave  moves  at  sound  speed  with  respect  to  the  gas  behind  it.  However,  an  expanding  deto¬ 
nation  is  weakened  by  expansion  induced  rarefactions  coming  from  behind  the  shock  and  the 
termination  point  for  the  reaction  moves  below  the  CJ  point  yielding  a  weak  detonation.  The 
flow  is  subsonic  behind  a  shock  but  supersonic  behind  a  weak  detonation.  Thus,  a  transition 
from  subsonic  to  supersonic  flow  must  occur  in  the  reacting  gas. 

Since  the  denominator  of  the  first  of  eqs.  (3.1)  vanishes  at  any  sonic  point,  in  order  to 
have  a  smooth  transition  through  a  sonic  point,  the  numerator  must  also  vanish  there.  The 
transformation 
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(3.2) 
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where 
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cl  +  :Lf1(r  ~  v2)  +  g(y  -  1)X. 


This  transformation  does  not  change  the  structure  of  the  phase  plane  and  the  critical  point 
conditions  for  this  system  are  the  same  as  the  conditions  for  a  smooth  sonic  transition  men¬ 
tioned  above.  These  conditions  are: 


(3.3) 


y(y  -  i)*(i  -  x)e*p(“'^L)  _  = 

i*(l  -  X)  exp  j-  ^-j(c2  ~  y2)  = 
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cl  +  ^z2  “  v2)  +  q(y  -  1)X  =  c2. 


From  a  computational  point  of  view,  eqs.  (3.2)  are  easier  to  work  with  than  eqs.  (3.1),  see 
also  [10]. 

To  solve  for  the  wave  speed  and  resolve  the  reaction  zone  we  proceed  as  follows.  We 
guess  a  value  for  v0  (that  is,  v  immediately  behind  the  initiating  shock)  and  solve  for  the  criti¬ 
cal  point  of  the  system  of  ordinary  differential  equations  (3.2)  by  iterating  on  X  and  v,  using 
the  equations 
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which  are  derived  from  eqs.  (3.3),  using  the  fact  that  v2  =  c 2  at  the  sonic  point.  The  square 
root  takes  the  same  sign  as  i.  We  then  integrate  system  (3.2)  numerically  with  the  initial 
conditions 


v(0)  -  v0 
X(0)  =  0 

to  find  the  trajectory  of  the  solution  in  the  v  -  X  plane  for  the  given  z.  Wc  update  the 
values  of  v0  and  i  based  on  this  trajectory  by  a  bisection  method  until  the  trajectory  passes 
within  a  specified  tolerance  of  the  critical  point.  The  solution  is  continued  through  the  criti¬ 
cal  point  by  finding  the  eigenvectors  there.  The  equation,  see  [10], 


P>  *  -  pw, 

and  the  last  of  eqs.  (3.2)  are  used  to  compute  the  pressure  and  density  through  the  reacting 
region. 

In  Fig.  3a,  we  present  the  v-X  phase  plane  portrait  as  well  as  the  sonic  locus  for  the 
value  of  z  which  yields  a  sonic  transition  for  the  given  data.  The  curve  passes  through  the 
critical  point  (S)  after  which  the  burning  continues  on  the  supersonic  side  of  the  sonic  locus. 
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4.  Results 

In  Fig.  4a,  we  compare  a  plot  of  the  pressure,  immediately  behind  the  shock  wave 
which  initiates  the  detonation,  vs.  time  for  a  planar  CJ  detonation,  computed  by  the  random 
choice  method  described  in  §2  (a  =  0),  with  a  plot  of  the  results  from  a  cylindrical  computa¬ 
tion  (a  =  1)  and  with  the  results  of  the  method  of  Jones  where  the  radius  of  curvature  is 
assumed  to  be  the  same  as  for  the  cylindrical  computation  at  all  times.  The  vertical  error 
bars  give  the  highest  and  lowest  values  of  pressure  over  each  100  time  steps  for  the  random 
choice  computations.  As  expected,  the  pressures  computed  by  the  two  cylindrical  methods 
approach  the  planar  CJ  pressure  just  behind  the  shock  as  the  radius  of  curvature  increases. 
We  note  that  the  random  choice  computations  are  initialized  with  the  planar  steady  state  solu¬ 
tion  and  it  takes  some  time  for  the  initial  transients  to  disappear  in  the  cylindrical  run.  When 
we  initialized  the  random  choice  method  with  the  results  of  Jones’  method  at  a  small  radius 
the  transients  were  much  smaller  but  the  results  for  larger  radii  were  not  significantly  dif¬ 
ferent  from  those  of  the  planar  initialization.  A  comparison  of  the  pressures  behind  the 
shock  wave  using  these  initializations  is  seen  in  Fig.  4b.  In  these  computations,  we  elim¬ 
inated  the  regions  more  than  3  reaction  zone  lengths  behind  the  initiating  shock  wave. 

To  exhibit  the  validity  of  Jones’  method  to  leading  order  in  inverse  radius,  we  present, 
in  Fig.  4c,  a  plot  of  pressure  behind  the  initiating  shock  wave  vs.  inverse  radius  for  the 
numerical  methods  described  in  $2  and  §3.  We  also  show  the  line  predicted  by  the  theory  of 
Jones  for  the  leading  order  corrections  to  pressure  due  to  curvature,  based  on  computations 
using  Jones’  equations  with  very  large  radius  of  curvature.  The  oscillations  in  the  random 
choice  computation  are  due  to  the  numerical  method  and  decrease  with  refinement  of  the 
grid.  A  similar  plot  is  achieved  for  the  corrections  of  detonation  wave  speed  due  to  curva¬ 
ture  (Fig.  4d). 

Fig.  4e  shows  the  states  of  a  reacting  gas  for  a  planar  CJ  detonation,  for  an  expanding 
cylindrical  detonation  using  the  method  described  in  §2,  and  for  the  method  of  Jones  at  a 
fixed  time.  We  have  plotted  pressure  vs.  specific  volume  along  with  the  unburned  and 
burned  Hugoniot  curves.  It  is  plotted  at  a  time  when  the  radius  of  curvature  is  approximately 
50  times  the  length  of  the  reaction  zone.  The  steady  state  planar  wave  is  initiated  by  a  shock 
and  moves  down  the  Rayleigh  line  from  A  to  CJ  (the  CJ  point)  as  the  reaction  progresses. 
This  line,  when  extended,  passes  through  the  point  representing  the  initial  ahead  state.  The 
detonation  waves  for  the  two  cylindrical  methods  are  initiated  by  weaker  shocks,  correspond¬ 
ing  to  a  lower  pressure  on  the  unburned  Hugoniot  (points  F  and  D),  and  move  down  along 
the  curves  shown  to  a  weak  detonation.  These  curves  do  not  terminate  on  the  burned 
Hugoniot  curve  since  it  is  computed  from  a  planar  theory.  Wood  and  Kirkwood  [14]  have 
derived  equations  for  the  modifications  of  Hugoniot  curves  in  a  curved  geometry.  From  Fig. 
4e,  we  see  the  effect  of  curvature  on  the  pressure  just  behind  the  shock  and  through  the  reac¬ 
tion  zone.  These  computations  show  that  a  12.5-17.5%  smaller  jump  in  pressure  at  the  shock 
occurs  in  the  curved  geometry  than  in  the  corresponding  planar  calculation.  Here  the  radius 
of  curvature  was  approximately  50  times  the  reaction  zone  width. 

5.  Conclusions 

We  have  shown  that  the  derivation  by  J.  Jones  [10]  of  the  corrections  due  to  curvature 
of  the  speed  of  detonation  waves  and  the  pressure  behind  the  shock  wave  are  correct  to  first 
order  in  the  inverse  of  radius  of  curvature.  This  validation  was  necessary  since  the  passage 
from  the  original  system  of  partial  differential  equations  (2.1)  to  the  ordinary  differential 
equations  (3.2)  has  not  been  shown  rigorously. 

The  advantage  of  using  the  equations  of  Jones  is  considerable.  Solving  for  the  wave 
speed  and  the  states  of  the  reacting  gas  is  usually  accomplished  in  less  than  10  CPU  seconds 
on  the  ELXSI.  We  note  that  the  rapid  computation  of  wave  speeds  will  enable  two  dimen¬ 
sional  front  tracking  computations  to  be  extended  to  include  the  effects  of  curvature  on  deto¬ 
nation  waves  in  the  near  future.  Although  not  directly  comparable,  the  one  dimensional 
computations  with  fully  resolved  chemical  reactions,  fine  grids  and  Arrhenius  kinetics, 
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exhibited  in  Figs.  4a-e,  require  approximately  40  hours  of  CPU  time  on  the  same  machine. 
To  avoid  these  slow  computations,  for  practical  computations  one  would  normally  use  neither 
resolved  chemical  reactions,  nor  fine  grids,  nor  Arrhenius  kinetics. 

The  enhancement  of  the  front  tracking  method  ([3),[12])  using  Jones’  equations  would 
involve  modelling  the  reaction  zone  as  being  infinitely  thin.  Based  on  the  divergence  of  the 
flow  at  each  point  on  the  detonation  wave  front,  one  would  use  the  theory  of  Jones  to  find 
the  speed  of  propagation  of  the  detonation  front  at  that  point  and  the  state  of  the  gas  behind 
the  initiating  shock  wave  as  well  as  the  state  of  the  gas  behind  the  completed  chemical  ,eac- 
tion.  In  this  way,  the  transition  from  strong  or  CJ  detonations  to  weak  detonations  c*«  be 
modelled  for  two  dimensional  flows.  This  is  not  possible  with  the  model  employed  in  [2j. 
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Pressure 


Fig.  2a.  A  ID  Stable  Computation.  A  plot  of  pressure  vs.  dis - 
tance  is  shown  for  a  planar  detonation  initialized  with  the  steady 
state  ZND  solution.  The  initialized  reaction  and  the  reaction 
1200  time  steps  later  are  superimposed  so  that  both  fronts  are  at 
the  same  location  on  the  graph.  The  solution  is  clearly  not  de¬ 
formed  by  the  numerical  method.  The  state  ahead  of  the  initiat¬ 
ing  shock  (in  units  where  the  gas  constant  A  ■  1)  has  P  «=  100, 
u  -  0,  p  =  1.4.  The  heat  release  ,q,  is  300;  E  =  100;  Te  *  215 
and  y  =  1.1.  The  distance  from  the  initiating  shock  wave  to  the 
point  where  the  gas  is  90%  burned  is  0.75  and  the  grid  spacing 
h  0.0125. 
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Fig.  2b.  A  ID  Unstable  Computation.  A  plot  of  pressure 
behind  the  shock  initiating  the  detonation  vs.  time  for  the  data 
described  in  [11,  pp.  18-19].  The  reaction  zone  was  initialized 
with  length  1.75.  The  ahead  state  has  p  =  1,  u  =  0,  p  =  1. 
q  *  50,  £  ■  50,7  ■  1.2,  ft -206.  The  speed  of  the  initialized 
wave  Is  1J265  times  the  CJ  wave  speed  for  the  ahead  state.  Grid 
spacing  Is  0.05.  The  results  are  similar  to  those  In  [6]  and  [11]. 
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Fig.  3a.  Phase  Plane  Portrait  of  the  Reaction  using  Jones’ 
Equations.  A  plot  of  the  trajectory  through  the  sonic  point  (  S  ) 
in  the  v  -  X  plane  for  the  runs  in  the  following  figures  when  the 
radius  of  curvature  is  50  times  the  reaction  tone  length.  The 
sonic  locus  is  also  shown.  A  corresponds  to  the  point  behind  the 
Initiating  shock  wave  while  B  represents  the  termination  of  the 
the  reaction  as  a  weak  detonation. 
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Fig.  4a.  Effect  of  Curvature  on  Pressure  behind  the  Initiating 
Shock.  A  plot  of  pressure  vs.  time  for  planar  and  cylindrical 
computations  using  the  random  chotce  method  of  52  and  the  solu¬ 
tion  to  Jones'  equations  where  the  radius  of  curvature  is  assumed 
to  be  the  same  as  for  the  cylindrical  run  by  random  choice.  The 
error  bars  show  the  range  of  values  over  each  100  time  steps  for 
the  random  choice  calculations.  The  ahead  state  has 
P  *  300,  u  *=  0,  p  *=  1.4.  The  reaction  tone  has  length  J, 
q  *  300,  Te  *  21S,  y  *  1.1,  E  m  100  and  grid  spacing  0.01. 
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Pressure 


Fig.  4b.  Comparison  of  Initializations.  The  panels  above  show 
plots  of  pressure  behind  the  initiating  shock  as  a  function  cf  time 
for  the  same  cylindrical  expanding  detonation  problem  as  in 
Fig.  4a  using  the  plmar  steady  state  initialization  (a)  and  initial¬ 
ization  by  solution  to  Jones's  method  at  a  small  radius  (b).  These 
two  plots  are  superimposed  in  (c). 
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Fig.  4c.  First  Order  Corrections  to  Pressure.  Pressure  behind 
the  initiating  shock  wave  is  plotted  against  inverse  radius  for  the 
cylindrical  computations  of  Fig.  4t.  Also  shown  is  the  leading 
order  correction  predicted  by  solving  Jones’  equations  for  very 
large  radii. 
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Fig.  4e.  Effect  of  Curvature  oo  the  Hugoniot  Diagram.  Pres¬ 
sure  is  plotted  against  specific  volume  for  the  calculations  used 
in  Figs.  4a-d.  The  unburned  and  burned  Hugoniot  curves  are 
presented  as  well  as  the  path  through  the  reaction  zone  for  a 
planar  detonation  (from  A  to  CJ)  and  cylindrical  detonations 
where  the  radius  of  curvature  Is  approximately  50  times  the 
reaction  tone  length  by  the  random  choice  method  (from  D  to  E) 
and  by  Jones's  method  (from  F  to  G ).  The  pressure  jump  at  the 
front  is  reduced  by  by  the  curvature. 
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Abstract 

The  impact  of  a  mass  on  a  structural  plate  of  a  compartment 
causes  plastic  and  elastic  deformations  which  can  give  rise  to  pressure 
fluctuations  of  significant  magnitude  and  duration  due  to  the  confined 
nature  of  the  compartment  and  the  low  damping  forces  in  the  gas. 
This  paper  presents  a  procedure  for  calculating  the  pressure  transients 
in  an  enclosed  gas  from  an  impact. 

The  procedure  uses  a  formulation  which  gives  the  equations  of 
motion  in  terms  of  a  scalar  momentum  potential.  This  momentum 
potential  is  physically  interpretable  as  a  pressure  impulse.  With  this 
formulation  the  transient  pressure  behavior  of  the  gas  is  characterized 
by  a  single  partial  differential  equation  which  is  the  wave  equation 
in  three  dimensions.  The  spatial  derivatives  are  treated  by  a  finite 
element  technique  to  obtain  solutions  for  an  arbitrary  geometry  of 
the  enclosure. 

The  boundary  conditions  for  the  problem  are  that  the  normal  ve¬ 
locity  of  the  gas  is  compatible  with  the  prescibed  velocity  of  the  en¬ 
closure  walls.  The  rate  of  deformation  of  the  wall  resulting  from  the 
impact  is  approximated  by  modelling  the  region  in  the  form  of  two 
concentric  plastic  hinges.  Since  the  stress  in  the  hinges  must  be  at 
the  yield  value,  it  is  possible  to  approximate  the  plastic  deformation 
and  hence  the  time  history  of  the  deformation. 
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1  Introduction 


The  subject  of  a  projectile  penetrating  the  wall  of  a  cavity  has  received  a  great  deal 
of  attention  because  of  possible  military  applications.  A  projectile  which  failed  to 
perforate  was  considered  of  little  interest.  However  there  is  increased  interest  in  the 
transient  response  of  an  enclosed  gas  resulting  from  a  mass  impacting  a  cavity  wall 
since  it  may  cause  suiiicient  deflection  of  the  wall  to  give  rise  to  pressure  liuctualions 
of  significant  magnitude  and  duration. 

Tliis  paper  presents  a  procedure  for  calculating  the  pressure  transients  in  an 
enclosed  gas  resulting  from  a  deformation  of  the  cavity  wall.  The  suitability  of  this 
procedure  to  investigating  pressure  transients  resulting  from  an  projectile  impact 
on  the  cavity  wall  is  investigated. 

2  Equations  of  Motion  of  a  Gas 

t 

The  force  balance  on  an  element  of  the  gas  is  shown  in  figure  1.  The  components 
of  displacement  in  the  ij,  x*,  x3  direction  are  denoted  as  ui,U2,u3  respectively  and 
the  pressure  is  denoted  with  a  *p\ 

Starting  from  the  force  balance  in  the  xpdireclion  we  obtain 


(p  -  (l>  +  ~dx,))tlxj  iiij  =  pdx,  dxj  dxt-g~ 


(1) 


which  yields 

dp  _  cPtii 
Oxi  P  dt 7 

which  can  be  generalized  for  the  V  direction 

dp 

~  d7i=  PTt*~ 


(2) 

(3) 


These  represent  the  equilibrium  equations  of  the  gas.  This  formulation  has  the 
disadvantage  that  one  can  have  infinite  solutions  for  which  ui ,uj,u3  are  not  zero 
whereas  the  volumetric  strain  is  zero  (i.c  spurious  solutions). 


Impulse  Formulation  In  order  to  reduce  the  problem  to  a  convenient  form  for 
solving  the  pressure  impulse  formulation  or  ref.  |1)  and  |2j  (i.e.  q  —  /  pdt)  is  used. 

The  cause-elfecj,  relationship  involving  pressure  impulse  and  the  volumetric 
slraiu  is  shown  here  as  the  rale  of  change  in  pressure  equal  to  minus  the  bulk 
modulus  (K)  times  the  volumetric  straiu  (<voj). 
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=  P  =  -if  Cvol  “  -  if  ( 


dui 

dx\ 


dua  duS) 
+  dXi  dx  s} 


H) 


This  represents  the  constitutive  equations  of  the  gas  in  terms  of  the  pressure  im¬ 
pulse.  84 


Differentiating  eq.  3  with  respect  to  xi,ij,xj  respectively  and  substituting  into 
eq.  4,  we  obtain  eq.  5 

V’,  -  ii  (*) 

.  K 

where  C  —  — 


On  a  fixed  rigid  wall,  the  normal  displacement  and  hence  also  the  normal  velocity 
must  vanish.  For  a  surface  which  has  a  prescribed  velocity  V,,  the  normal  component 
of  the  velocity  of  the  gas  and  the  wall  must  match. 


Therefore  it  follows  from  eq.  3  that 


=  pV„  (6) 

This  formulation  has  the  advantage  that  it  requires  the  solution  of  only  one 
equation  and  that  the  boundary  conditions  are  in  terms  or  velocity.  Analytical 
solutions  to  eq.  5  and  eq.  6  can  be  found  by  classical  means  for  cases  where  the 
physical  problem  has  a  simple  geometry  and  boundary  condition. 


3  Finite  Element  Model 


Since  we  wish  to  be  able  to  model  cavities  of  arbitrary  geometry,  it  is  necessary  to 
solve  the  equation  using  finite  elements.  For  this  purpose  it  is  useful  to  cast  the 
essential  equations  of  the  problem  into  a  variational  statement. 

Using  the  complementary  energy  principle,  the  increment  of  work  done  by  vio¬ 
lation  of  eq.  5  and  eq.  6  results  in  eq.  7. 

/v<7V’,+  ^>dV  +  L{lll  +  V^ds  =  °  <7> 

Using  Green’s  theorem,  integrating  with  respect  to  time  and  integrating  by  parts 
we  obtain  the  required  variational  statement 


The  first  integral  is  over  the  volume  of  the  gas  and  the  second  integral  is  over  that 
part  of  the  surface  where  the  normal  velocity  Vn  is  prescribed. 
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For  use  in  cavities  of  arbitrary  geometry  an  8  noded  isoparametric  element 
(ref.  [3])  as  shown  in  figure  2  is  used.  The  six  faces  of  the  element  can  be  quadrilat¬ 
erals  of  arbitrary  shape,  however,  the  4  edges  of  the  quadrilateral  must  be  straight. 
The  shape  functions 

Ni  =  1/8(1  -  a,)(l  -  a2)(l  -  a3) 

N2  =  1/8(1  -  a,)(l  +  a2)(l  -  a3) 

N$  =  1/8(1  -f-  a»)(l  +  a,)(l  -  a3) 

N4  =  1/8(1  +  oi)(l  -  a2)(l  —  a3) 

N$  =  1/8(1  -  Oi)(l  -  a2)(l  +  a3) 

N&  =  1/8(1  -  ai)(l  +  a2)(l  +  Qfs) 

Nj  =  1/8(1  +  ai)(l  +  o2)(l  +  a3) 

•Ng  =  1/8(1  +  Ck2 )( 1  -  a2)(l  +  a3)  (9) 

transform  the  xi,x2,z3  coordinates  into  the  ai,a2,a3  coordinate  system  such  that 
the  8  node  element  is  transformed  into  a  regular  cube.  Since  the  elements  are 
isoparametric,  the  same  shape  function  is  used  to  interpolate  the  impulses  (  eq.  10) 
as  is  used  to  transform  the  coordinates. 

q(a,,o2,a3)  =  \NU  Nt, . . .  N*]  to  (10) 

92 

.  9s  . 

The  impulse  gradient  can  be  expressed  in  terms  of  the  nodal  pressure  impulses. 

=  \BM  (11) 

The  Jacobian  matrix  of  transformation,  represented  here  in  symbolic  form  by  the 
letter  J,  allows  us  to  transform  the  impulse  gradient  from  the  xi,x2,x3  coordinate 
system  to  the  at,a2,a3  coordinate  system. 

<£>  -  <12> 

The  variational  statement  (eq.  8)  has  three  terms.  The  first,  shown  in  eq.  13, 

can  be  reduced  and  integrated,  using  eq.  11  and  eq.  12,  to  produce  the  stiffness 
matrix  ( Kt). 

///  h  [(fi>T{fi}  dx'dz'dz'  =  f j  I  U}T\B}r\r'\T\J-'\\BM\J\daida,da, 

=  (i3) 
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It  is  important  to  bear  in  mind  that  in  eq.  13  we  have  the  discrete  form  of  the  kinetic 
energy  and  the  dimensions  of  \Kt\  elements  are  velocity  per  unit  momentum. 

Consider  the  second  term  in  the  functional  given  in  eq.  8.  With  a  change  in  vari- 
ables  and  integrating  numerically  using  the  Gauss-Legendre  methods  one  obtains 
eq.  14. 


Iff  ](^,dx'dxidzs~6^  ffj  2^{Af)rWIJl‘,al<i“>‘,Qs  {«} 


=  |{«r|A/,]{,>  (14) 


\Mt]  is  refered  to  as  the  element  mass  matrix  but  it  should  be  noted  that  the 
dimensions  of  Me  elements  are  displacements  per  unit  force.  In  reality  Me  is  the 
flexibility  matrix. 

Similarly  the  third  term  can  be  evaluated  to  produce  the  virtual  work  of  the 
prescribed  velocities  in  discrete  form  as  follows 


[  Vn(qds  =  jj  V„6qJl  +  (|^)»  +  (|^)l  dz,  dz, 

=  //  <Wl 

where  {N,}  =  shape  function  on  the  face  where  V  is  prescribed 

[fl,]  =  gradient  of  shape  function  on  the  face  where  V  is  prescribed 

[J,|  =  jacobian  on  the  face  where  V  is  prescribed 

Integrating  using  the  Gauss  Legendre  method. 

jvn6qds  =  6{q}T{Ve}  (15) 

Having  found  the  discrete  forms  of  kinetic  energy,  complementary  strain  energy 
and  the  virtual  work  of  prescibed  velocities,  we  can  write  the  discrete  form  of  the 
Complementary  Energy  principle. 

£'  Q{«>tik.h«}  -  -  %m>)  n 

Expressing  the  functional  in  terms  of  the  global  matrices  and  carrying  out  the 
extremization  we  And  the  discrete  equations  of  motion. 


!*,]<«>  +  =  \v.\  (16) 

To  check  that  element  matrices  are  correctly  computed  and  assembled,  natural 
frequencies  of  a  cavity,  modelled  by  different  number  of  elements,  were  computed. 
The  test  cavity  is  a  unit  cube  for  which  the  natural  frequencies  can  be  determined 
analytically. 
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The  first  frequency  is  the  zero  frequency  associated  with  a  mode  wherein  the 
impulse  q  is  constant  and  the  mode  associated  with  the  first  two  non-zero  frequency 
are  standing  waves  for  q,  in  the  form  of  a  cosine  function. 

The  results  of  the  investigation  are  shown  in  Fig.  3  as  the  convergence  of  the 
computed  natural  frequencies  to  the  known  natural  frequencies  as  a  function  of  the 
number  of  elements  in  the  xj-direction.  These  results  verify  the  correctness  of  the 
element  matriries  as  well  as  the  connection  process  and  give  a  good  indication  of 
the  order  of  accuracy  one  can  expect  from  eight  noded  elements. 

Solution  of  Transient  Equation  The  matrix  eq.  16  can  be  integrated  using  a 
moving  polynomial  solution 

{?}  =  {°i}  +  +  {as}t2  (17) 

Choosing  equally  spaced  previous  values  of  {q},  it  is  possible  to  solve  for  the  {a}’s 
of  eq.  17  and  substituting  into  eq.  16  to  result  in  eq.  18. 

+  [if,])  {,„}  =  {V„.}  +  ^L|Af,](2{, 


or 

M  {*}  =  {*}  (18) 

The  response  of  the  model  is  dependent  on  the  number  of  elements  used.  To 
evaluate  these  effects  we  consider  again  the  unit  cube  made  up  of  two  elements 
in  the  and  x3  direction  and  a  variable  number  of  elements  in  the  X\  direction. 
A  velocity  is  imposed  on  the  central  node  of  the  zj,  z3  face  of  the  cube  which  is 
initially  at  rest  at  time  zero.  The  results  are  shown  in  Fig.  4  as  the  maximum 
pressure  in  the  cavity  as  a  function  of  time  for  different  numbers  of  elements  in  the 
Xi  direction. 

The  response  of  the  system  will  also  vary  with  time  step  size(e.g.  numerical 
damping).  To  illustrate  this  a  unit  cube  is  again  subjected  to  a  velocity  at  its 
center  node.  The  results  are  showMi  in  Fig.  5  as  the  maximum  pressure  in  the 
cavity  versus  time  for  various  time  steps. 

Due  to  the  short  time  required  for  the  cavity  wall  to  reach  its  maximum  ve¬ 
locity,  it  is  essential  that  the  model  be  capable  of  simulating  the  higher  frequency 
components  of  the  gas.  This  along  with  the  fact  that  the  mesh  must  be  fine  enough 
to  adequately  represent  the  localized  applied  velocity,  makes  it  necessary  to  use  a 
fine  mesh  and  small  time  step.  The  large  number  of  element  will  require  a  large 
amount  of  computer  storage  but  matrix  [A)  is  eq.  18  need  only  be  inverted  once  if 
the  integration  time  step  is  kept  constant. 
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4  Application  to  Cavity  Impact 


The  pressure  pulse  arises  from  the  boundary  condition  that  the  normal  velocity  of 
the  gas  is  compatible  with  the  enclosed  wall.  Since  this  analysis  was  developed  for 
the  purpose  of  estimating  pressure  transients  in  a  cavity  resulting  from  an  object 
impacting  the  cavity  wall,  a  procedure  for  approximating  the  boundary  conditions 
resulting  from  an  impact  has  been  included. 

Fig.  6  shows  a  blunt  object  striking  a  cavity  wall  at  normal  incidence.  The  de¬ 
formation  process  is  divided  into  two  phases.  In  the  first  phase  the  bulge,  modelled 
by  two  concentric  hinges,  is  accelerating  until  it  attains  the  projectiles  velocity.  The 
second  phase  consists  of  the  hinge  and  projectile  decelerating  together. 

It  is  possible  to  approximate  the  velovity  V0  and  the  times  T0  and  Tj  using  a 
procedure  similar  to  that  used  in  ref.  [4].  The  thrust  on  the  target,  F,  is  given  by 


F  =  [nr.  +  (V,  -  Vh)>p\Ar 


where  Ap 

V, 

n 

°Yr 


cross-sectional  area  of  the  projectile 
velocity  of  the  projectile 
velocity  of  the  bulge 
constrained  uniaxial  yield  stress 


This  allows  us  to  approximate  the  velocity  history  of  the  projectile  and  the  cavity 
wall  during  the  impact. 

Fig.  7  shows  the  results  for  a  3  Kg  projectile  with  a  60mm  diameter  and  a 
velocity  of  lKm/sec  striking  a  hemisperical  cavity  with  a  1M  radius  and  a  thickness 
of  30mm.  The  results  are  shown  here  as  the  pressure  at  the  point  of  impact  and 
at  a  point  located  at  the  center  of  the  cavity  versus  time.  The  results  show  that 
severe  pressures  of  very  short  duration  will  result. 


5  Conclusion 

In  this  paper  a  procedure  for  calculating  the  pressure  transients  in  a  cavity  using 
a  finite  element  method  has  been  demonstrated.  The  results  shown  in  this  paper 
suggest  that  the  procedure  is  suitable  for  determining  the  pressure  transients  in  an 
armoured  vehicle  subjected  to  an  impact  subject  to  the  conditions  that  the  striking 
projectile  does  not  perforate  the  cavity  wall  and  that  the  velocity  of  the  wall  is 
below  the  speed  of  sound  in  the  gas.  This  is  generally  the  case  when  a  velocity  of 
a  projectile  is  below  the  ballistic  limit  of  a  cavity  wall. 
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Figure  3.  Convergence  of  2nd  and  3rd  eigenvalues. 
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Figure  4.  Effect  of  mesh  size  on  response  to  unit  step. 
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Figure  5.  Effect  of  time  step  on  response  to  unit  step. 
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Figure  6.  Idealization  of  impact 


Figure  7.  Response  of  hemispherical  cavity  to  an  impact. 
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Abstract 

A  finite  element  method  introduced  in  [1]  for  computing  Bueckner-Rice  weight  functions  in  finite  bodies 
is  described  and  the  singular  fields  for  a  semi-infinite  crack  in  an  elastic  and  anisotropic  body  is  given.  These 
singular  fields  provided  the  required  information  for  the  computation  of  weight  functions  in  anisotropic  bodies 
under  plane  deformations. 


1.  Introduction 

A  finite  element  procedure,  introduced  in  [1],  has  provided  a  unified  approach  in  computing  the 
Bueckner-Rice  weight  functions  for  all  three  fracture  modes  under  either  displacement,  traction  or  mixed 
boundary  conditions.  This  finite  element  procedure  [1]  is  simple  to  implement  and  the  results  [1]  obtained  for 
two  dimensional  isotropic  cracked  solids  are  very  accurate.  In  this  study,  this  recently  developed  two  and  three 
dimensional  finite  element  method  is  applied,  as  particular  cases,  to  determining  the  weight  functions  in 
anisotropic  bodies  under  plane  deformations. 

The  synopsis  of  this  paper  is  as  follows.  We  first  summarize  in  section  2  the  finite  element  procedure 
introduced  in  [1]  for  determining  the  weight  functions.  This  finite  element  method  is  valid  for  both  two  and 
three  dimensional  problems;  however,  in  this  paper,  we  shall  concentrate  on  its  two  dimensional  aspects.  In 
section  3,  we  present  the  weight  functions  for  a  semi-infinite  crack  in  an  anisotropic  full  space  which  are 
required  in  the  finite  element  procedure  [1]  for  computing  weight  functions  in  anisotropic  bodies. 


2.  Finite  Element  Method  for  Determining  Weight  Functions  in  Finite  Bodies 

Consider  a  two  dimensional  cracked  body  containing  a  single  or  a  system  of  cracks.  Let  P  be  the  specific 
crack  tip  at  which  we  wish  to  determine  the  stress  intensity  factors.  A  crack  tip  cartesian  coordinate  system 
centered  at  P  is  employed  with  e,  being  a  set  of  unit  base  vectors.  Roman  subscripts  have  range  1  to  3  and 
summation  convention  is  employed  unless  otherwise  stated.  We  shall  also  use  an  in-plane  polar  coordinate 
system  (r,0)  centered  at  the  crack  tip.  Generalized  plane  deformation  is  assumed  so  that  the  stresses  and  strains 
are  functions  of  the  in-plane  coordinates  only.  The  stress  intensity  factors  for  an  anisotropic  solid  can  be 
defined  by  the  traction  vector  acting  on  the  plane  directly  ahead  of  the  crack  front  as 

K,  =  linWScr  oi2(x,,0)  (1) 


where  K\ ,  Kj  ,  K3  are  the  mode  II  (in-plane  shear  mode),  mode  I  (in-plane  opening  mode)  and  mode  III 
(out-of-plane  shear  mode)  stress  intensity  factors,  respectively.  The  weight  functions  corresponding  to  the  crack 
tip  P  will  be  denoted  as  h^x;?)  and  they  are  vecer-valued  functions  of  position  x.  Under  mixed  boundary 
conditions  and  body  force  loading,  the  stress  intensity  factors  of  P  can  be  computed  by  [1.2] 


jT  •  hi  dA  +  jti  •  U  dA  + 


Fh  ,dV 


(2) 


where  T  and  U  are  the  prescribed  surface  tractions  and  boundary  displacements  on  the  boundary  Sr  and  Su 
respectively;  F  is  the  prescribed  body  force  field  in  the  body  with  volume  V;  and  t,  are  the  tractions  generated 
by  the  weight  functions  h, ,  each  interpreted  as  displacement  field,  on  the  boundary  S* . 

The  weight  functions  h,  (Bueckner  [3,4]  and  Rice  [5])  are  universal  functions  for  given  crack 
configuration,  body  geometry  and  material  properties  and  are  independent  of  loading  systems.  The  stress  field  of 
h,-  is  in  equilibrium  with  zero  body  force  and  it  generates  zero  traction  on  all  the  crack  faces  and  on  the  external 
boundary  ST.  On  the  external  boundary  SM  where  displacements  are  prescribed,  U*e  weight  functions  b,  arc 


t  Thu  (Clean*  waa  supported  by  NASA  aid  AFOSR  aider  NASA  Gnmt  NOL  33-018-003. 
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zero.  The  weight  functions  yield  elastic  fields  which  give  rise  to  unbounded  energy  for  any  finite  region 
encompassing  the  crack  tip  P . 

Let  Si*  be  a  suitably  small  bounding  surface  in  the  body  which  isolates  the  crack  tip  P  from  the  rest  of 
the  body.  The  part  of  the  body  inside  the  surface  is  referred  to  as  region  B  and  the  remaining  part  of  the 
body  is  denoted  as  region  A .  The  unit  outward  normal  to  the  bounding  surface  of  region  B  is  n.  In  region  B , 
the  weight  functions  h,  are  decomposed  into  modified  singular  displacements  0*  and  modified  regular 
displacements  B,',  viz 

h,-  =  1*  +  ttjf  (3) 

The  modified  singular  displacements  0*  are  constructed  so  that 

(i)  they  admit  the  same  singularity  ns  the  weight  functions  h,  at  the  crack  tip  P ;  and 

(ii)  they  generate  zero  traction  on  iiie  crack  faces  inside  region  B . 

The  modified  regular  displacements  B,r  are  taken  to  be  bounded  at  the  crack  tip  P  and  they  can  be  identified  as 
the  displacements  in  elastic  crack  analyses.  The  singular  stress  fields,  ft,',  of  0/  and  the  regular  stress  fields,  <s[, 
of  6/  are  self-equilibrating  and  they  generate  zero  tractions  on  the  crack  faces  inside  region  B . 

It  is  noted  that  the  modified  singular  displacements  O'  defined  as  such  do  not  lend  themselves  to  a  unique 
construction  in  general.  Indeed,  it  is  the  non-uniqueness  in  their  construction  which  allows  us  to  make  judicious 
choices  of  B'  •  we  can  choose  0/  to  correspond  to  the  simplest  possible  crack  geometry. 


2.1.  Variational  Principle  for  h,  and  B' 

The  weight  functions  h(-  in  region  A  and  the  modified  regular  displacements  b'  in  region  B  of  the  cracked 
body  under  consideration  can  be  determined  by  the  finite  element  method  introduced  in  [1].  This  finite  element 
method  is  based  on  the  following  minimum  principle  [1]. 

Define  a  functional  //  as 


W[h..Bn  =  Jw(e,)dV 


dV-  j  (-O'n)  ■  B,'  dA 

&  Smi 


(no  sum  on  i ) 


(4) 


where  e,  and  t[  are  the  strain  fields  corresponding  to  h,  in  A  and  D,r  in  B  respectively  and  w  is  the  elastic 
strain  energy  density.  The  functional  //  is  bounded  and  it  is  a  functional  of  h*  in  region  A  and  B/’  in  region  A . 
It  has  been  proven  in  [1]  that  among  all  possible  fields  h,  in  region  A  and  B'  in  region  B  which 

(i)  satisfy  the  strain-displacement  relations;  and 

(ii)  make  h,  zero  on  the  external  boundary  5*  and  equal  to  the  sum  of  B'  and  O'  on  the  internal  boundary  SiM, 
where  B/  are  considered  to  be  given; 

the  true  fields  (h;)*  in  region  A  and  (0,0*  in  region  B  minimize  the  functional  H. 

An  implementation  of  this  variational  principle  within  the  context  of  a  displacement-based  finite  element 
method  is  given  in  [1]  and  this  implementation  can  be  incorporated  into  standard  linear  elastic  finite  element 
program  with  little  programming  efforts.  This  procedure  is  very  similar  to  standard  finite  element  methods  and 
it  involves  prescribing 

(i)  nodal  forces  corresponding  to  the  tractions  -6,'n  on  the  internal  boundary  5^ 

(ii)  nodal  "effective"  body  forces  [1]  for  the  elements  in  region  A  which  are  adjacent  to  the  internal  boudnary 
5m,;  and 

(iii)  zero  tractions  and  displacements  on  the  external  boundaries  ST  and  S„  respectively. 

The  nodes  inside  region  B ,  including  those  on  5^,  are  interpreted  as  nodal  unknowns  for  the  modified  regular 
displacements  0/  and  the  remaining  nodes  represent  the  nodal  values  of  the  weight  functions  h, . 

Thus,  in  employing  this  procedure  to  determining  weight  functions  in  two  dimensional  anisotropic  bodies, 
we  first  have  to  choose  some  modified  singular  displacements  0,'  which  satisfy  the  conditions  delineated  above. 
The  simpliest  candidates  for  this  purpose  are  the  weight  functions  of  a  semi-infinite  crack  in  an  anisotropic  full 
space.  They  are  given  in  the  following  section  by  ways  of  Rice's  [5-7]  crack  front  variation  approach. 
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3.  Weight  Functions  for  a  Semi-Infinite  Crack  in  an  Anisotropic  Full  Space 

First,  consider  a  crack  in  an  elastic  anisotropic  solid  of  finite  extent  The  energy  release  rate  G  for  the 
anisotropic  solid  may  be  expressed  in  term  of  the  stress  intensity  factors  as  [6-8] 

G  -  Ki  Ajj  Kj  (5) 

where  A  is  symmetric  and  positive  definite;  it  depends  only  on  the  clastic  constants;  and  it  can  be  related  to  the 
pre-logarithm  energy  factor  of  a  straight  dislocation  line  in  an  uncracked  solid,  lying  parallel  to  the  crack  front 
in  the  cracked  body.  The  energy  release  rate  G  can  also  be  decomposed  additively  into  three  components,  Gt , 
according  to  Irwin’s  concept  of  virtual  crack  extension  [9]  as 

,  *■ 

Gi  -  lim  4  (<y,2(xi,0)  [  u,0c1-&j,O+)  -  u1(xi-5a,0_)  ]  dxx  (no  sum  on  i) 

where  ol2(xlt0)  is  the  traction  acting  on  the  e2-plane  ahead  of  the  crack  tip  and  «i(x1-5a,0+)  and  u,(xi-5a,0_) 
are  the  displacements  on  the  upper  and  lower  crack  face  respectively.  It  is  customary  to  refer  to  G\,  C2  and  C3 
respectively  as  mode  II,  mode  /  and  mode  III  energy  release  rate.  Thus,  for  an  anisotropic  body,  G,  can  be 
related  to  the  stress  intensity  factors  via 

3 

Gi  =  A  ij  Kj  (no  sum  on  i ) 

h  i 

For  an  isotropic  body,  A  is  diagonal  and  eqn  (5)  reduces  to  the  familiar  Irwin  relation 

G  =  (K\2  +  Ki *)  +  K2  (6) 

under  plane  strain  conditions.  Here,  v  is  the  Poisson’s  ratio  and  £  is  the  Young’s  modulus. 

Adopting  Rice’s  [5]  crack  front  variation  concept,  let  the  anisotropic  cracked  solid  be  subjected  under  two 
linearly  independent  loading  systems  denoted  by  /  and  II  respectively.  Ql  and  Q"  are  the  generalized  loads 
and  q1  and  qa  are  the  corresponding  work  conjugate  generalized  displacements  for  the  two  loading  systems. 
Suppose  both  loading  systems  are  applied  to  the  cracked  body  simultaneously.  The  change  in  strain  energy  due 
to  virtual  displacements  hq‘  and  hqu ,  and  virtual  crack  front  variation  5a  at  fixed  external  forces  is 

V  +  Qa  hq"  -  ^  A ^  Kj  8a  (7) 

where  U  is  the  strain  energy  and  a  is  the  crack  length.  From  linearity,  we  may  write 

Ki  =  k!(a)  Q<  +  k!'(a)  Q" 

q‘  =C,j(a)Q'  +Cl>,/(fl)0//  (8) 

qU  -  Cnj(a)  Q1  +  Cnji(a)  Q" 

where  k'(a )  and  k''(a)  are  the  respective  geometric  dependent  part  of  the  stress  intensity  'actors  induced  at  the 
crack  tip  when  loading  systems  /  and  II  are  applied  individually  to  the  cracked  body.  The  C’s  are  the 
compliances.  A  Legendre  transformation  of  eqn  (7)  gives 

b(U  -  Q'q'  -  Q"q")  =  -  q'bQ'  -  q"SQ"  -GSa  (9) 


The  left  hand  side  of  eqn  (9)  is  a  perfect  differential  and  this  enables  us  to  obtain  the  following  reciprocal 
relation, 


dqa(Q',Q"< i) 

>  ^  Aij  Kj  )  ' 

da 

Q>  ,  Q" 

BQ" 

(10) 


Eqn  (8)  can  be  used  to  compute  the  derivative  on  the  right  hand  side  to  obtain 


Q',Qa 


(ID 


The  interpretation  of  eqn  (11)  is  as  follows  [5].  Suppose  that  we  know  the  complete  solution,  in  particular, 
k'(a )  and  Cnj,  when  loading  system  I  is  applied.  Setting  Q"  =  0  in  eqn  (1 1),  we  obtain 
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(12) 


Ql  ,  Qtt  m  0 


2  K!  Aij  kj’ 


It  is  noted  that  by  knowing  the  solution  for  loading  system  /,  we  can  compute  every  term  in  eqn  (12)  except  k1/. 
Thus,  when  eqn  (12)  is  multiplied  through  by  Qn ,  we  arrive  at  a  useful  relation 


Q" 


=  2KlAijK1/ 

Q',Q"m  0 


for  the  stress  intensities  induced  by  non-zero  loading  system  II  alone. 


(13) 


In  order  to  demonstrate  how  this  relation  is  used,  let  the  displacement  fields  ui,  u2  and  u3  be  three 
solutions  to  the  elastic  boundary  value  problem  corresponding  to  the  respective  loading  system  Q{ ,  Ql2  and  Q3 
which  are  linearly  independent  of  each  other.  For  our  discussion,  it  suffices  to  assume  that  each  loading,  Ql, 
induces  nonzero  stress  intensity  K(  alone  at  the  crack  tip.  Employing  these  three  solutions  to  relation  (13),  wc 
obtain 


Q" 


a*/' 


da 


=  Si,  Aj_  K" 


J  O'  .  O' 


(14) 


where 


Sy  =  2  K{  8 ij  (  no  sum  on  i  ) 

and  5 i;  is  the  Kronccker  delta.  Since  S1;  is  diagonal  and  positive  definite,  it  admits  the  inverse  S.J1,  viz 
Si]1  =  T~7  8y  (  n0  sum  on  i  ) 

l  iVj 


Thus  eqn  (14)  can  be  inverted  to  obtain  the  important  relation 


kP-q" 


V  S£ 


da 

Ql  ,Q"  .0  . 

(15) 


Rice  [S]  has  observed  that  the  bracketed  terms  on  the  right  do  not  depend  on  the  nature  of  loading  system  I  and 
hence  they  are  universal  functions  for  the  given  crack  configuration,  body  geometry  and  elastic  properties.  In 
fact,  these  functions  can  be  identified  as  Bucckner’s  [3]  weight  functions,  h,- ,  namely 


h  -  a-1  du« 
hi  ”  A"  Sjm  da 


(16) 


and  the  stress  intensity  factors  can  be  computed  by  eqn  (2)  given  above  for  a  general  loading  system  which 
consists  of  surface  forces,  boundary  displacements  and  body  forces.  For  the  case  of  symmetrical  mode  I 
loading  in  an  isotropic  solid,  eqn  (16)  reduces  to  the  same  expression  as  given  by  Rice  tS], 

M  du2(*.a) 

2  2K£a)  da 


with  M  being  related  to  the  appropriate  elastic  constants.  The  singular  stress  fields,  a/,  of  the  weight  functions 
h,  can  also  be  composed  from  the  stresses,  <sm ,  of  u„  as 


a'  =  A ]}1  Sj£ 


fen 

da 


(17) 


In  order  to  employ  eqns  (16)  and  (17)  to  determine  the  weight  functions  and  their  corresponding  stress 
fields  for  a  semi-infinite  crack  in  an  anisotropic  full  space,  we  shall  follow  the  plane  anisotropic  elasticity 
formulation,  originally  developed  by  Stroh  [10,11],  and  subsequently  used  by  Barnett  and  Asaro  [8];  and  Ting 
and  coworkers  [12-14],  among  others,  to  obtain  three  linearly  independent  solutions  for  the  semi-infinite  crack. 


With  respect  to  the  coordinates  introduced,  the  field  equations  for  the  anisotropic  solid  arc 


e<> 


1 

2 


du,  dUj 
dxj  dXj 
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Oij  =  Ctjkm 


(18) 


where  u  =  m(xi,*2)J  *  =  c(jc i^x2);  and  a  =  a(xxpc2)  are  the  displacements,  strains  and  stresses  respectively  and 
plane  strain  conditions  are  assumed.  A  general  solution  to  the  equations  in  (18)  is,  [10,11] 


u  =  A/(z) 


(19) 


with  z  =  x\  +  px 2,  and  A  and  p  are  complex.  Substituting  the  general  solution  into  eqns  (18),  the  stresses  can 
be  expressed  as 


<j  =  t 


M. 

dz 


Cijk  1  +  P  Cjji  2 


A* 


Equilibrium  can  be  satisfied  if 

[c,Ul  +  P  (C,U2  +  Ci2*l)  +  P2  Ci2k2  j  A*  =  0  (20) 

For  non-trivial  A,  the  determinant  of  the  bracketed  terms  would  have  to  be  zero  and  this  leads  to  a  scxtic 
characteristic  equation  for  the  roots  p.  Eshclby  ct  al.  [15]  have  shown  that  there  arc  no  real  roots  to  this 
characteristic  equation  and  thus  the  roots  occur  in  complex  conjugate  pairs.  Following  Slroh  [10,11],  we  shall 
introduce  a  complex  vector  L, 


U  -  xa  -  (Ci2*i  +  P  Ci2*2)  A* 

(21a) 

or 

Li  (Qu 2  +  P~x  Cm,)  A* 

(21b) 

Equation  (20)  can  then  be  recasted  as 

~  Ci2j2  CjU\  A*  +  C.-ji 2  C*  =  p  A, 

(22a) 

[C/iy2  Cjlin2  C„2t  1  “  ClJtl  j  A*  -  CiJj2  Cj2k2  Lk  =  P  U 

(22b) 

These  equations  are  in  the  form  of  standard  eigenvalue  problems  with  p  being  the  eigenvalue  and  the  vector  { 
AIL]  being  the  eigenvector.  Standard  procedures  (e.g.  EISPACK  [16])  can  be  employed  to  extract  the 
eigenvalues  and  the  eigenvectors  efficiently.  Since  the  eigenvalues  are  all  complex,  we  shall  order  them  such 
that  pa  would  have  positive  imaginary  part  and  pa  are  their  complex  conjugates.  Greek  subscripts  have  range  1 
to  3  but  they  do  not  conform  to  the  summation  convention.  The  corresponding  eigenvectors  will  be  denoted  by 
Aa  and  La  with  complex  conjugates  Aa  and  La.  Assuming  that  all  the  eigenvalues  pa  are  distinct,  the  general 
solution  u  and  a  can  be  expressed  as  a  linear  combination  of  all  the  eigenvectors  as 

u  =  2  Re  £  Aa  /a(za) 

a=l 


3 

a  =  2  Re  £  xa 


a*l 


dfa 

dza 


The  cases  of  repeated  roots  for  pa  can  also  be  treated  by  using  methods  given  by  Ting  and  Chou  [12];  and  Ting 
[13]  to  construct  the  appropriate  eigenvectors. 

Introducing  three  vectors  Ma  (Stroh  [10,11])  which  are  the  reciprocal  of  La  such  that 
Mo  •  Lp  =  Sop 

The  three  linearly  independent  solutions  for  a  semi-infinite  crack,  with  each  solution  u,-  corresponding  to  one 
stress  intensity  factor  Af,  being  induced  at  the  crack  tip  alone,  can  be  written  as  [14] 

Jfr  3 

u,  =  — T=-  K,  Re  £  (  M„  e,  )  Aa  £a1/2  ( no  sum  on  i  ) 

a  1 
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and  the  respective  stresses  are 
Ki  £ 

_  *  Tl* 


k -  R®  2  (  4a 

1%  r  cal 


(  no  sum  on  i  ) 


where 


4a  =  cos  0  +  Pa  sin  0 


Eventhough  the  eigenvectors  Aa,  La  are  determined  to  within  arbitrary  constants  from  eqns  (22a,b),  the  products 
of  the  components  of  Aa,  and  Ma  are  uniquely  determined.  The  displacement  and  stress  derivatives  with 
respect  to  crack  length  can  be  derived  by  employing  the  following  relations 

_d _ dr_  _d_  dQ_  d 

da  da  dr  da  39 

dr  _ 

—  =  -  cos  8 
da 


to  obtain 


Re  £  (  Ma  ■  e,  )  Aa  (  no  sum  on  i  ) 

a=l 


2  V2ii 


Re  (  Ma  ■  e,-  )  Ta  4a 

owl 


( no  sum  on  i  ) 


The  matrix  A  which  is  related  to  the  pre-logarithm  energy  factor  of  a  straight  dislocation  lying  parallel  to  crack 
front  front  can  be  expressed  in  terms  of  A„  and  M„  as  (Stroh  [10,11]) 

A"1  =  2  n  IT1  (25) 


B  =  “  £  [  A„  Ma  -  Aa  Ma  ] 

1  a=i  . 

where  the  terms  in  the  bracket  arc  dyadic  products  of  the  respective  vectors. 

Thus,  once  the  Stroh’s  eigenvalues  pa  and  the  Stroh’s  eigenvectors  Aa  and  La  are  computed  for  a  given 
anisotropy  of  the  body  under  consideration,  the  weight  functions  h,  and  their  associated  singular  stresses  o'  for 
a  semi-infinite  crack  can  be  obtained  by  using  eqns  (23-25)  in  eqns  (16)  and  (17). 

In  the  finite  element  implementation  of  the  variational  principle  discussed  in  section  2,  we  would  choose 
the  modified  singular  displacements  0*  and  stresses  tt'  to  be  those  corresponding  to  the  semi-infinite  crack 
geometry  obtained  above  for  all  weight  function  computations  in  2-D. 
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ARSTRACT 


The  work  is  directed  to  the  void  softening  mechanism  of 
shear  banding  in  ductile  high  strength  steels.  Elastic-plastic 
analyses  of  the  field  near  a  pair  of  interacting  voids  were 
conducted  using  a  finite  element  formulation  and  large  scale 
computational  facilities.  Results  suggest  dramatic  intensifica¬ 
tion  of  strain  between  interacting  voids.  The  nature  of  void 
interaction  was  found  to  be  significantly  different  in  the  cases 
of  nominal  shear  and  uniaxial  extension,  consistent  with  experi¬ 
mental  observations  of  void  linking. 

I NTRODUCT I  ON .  Shear  banding  is  a  serious  mode  of  degradation  of 
high  strength  steel  loaded  into  the  plastic  range  and  important 
design  issues  require  an  understanding  of  its  causes.  Con¬ 
trolled  shear  experiments  have  demonstrated  that  localization 
into  narrow  shear  bands  occurs  at  a  material  characteristic 
strain  level.  Banding  occurs  under  both  high-rate  and  quasi¬ 
static  loadings.  Once  strain  localizes,  continued  deformation 
to  fracture  occurs  under  decreasing  applied  stress.  Hence, 
there  is  an  underlying  strain-softening  mechanism  associated 
with  the  banding  event.  In  high  rate  loadings,  thermal 
softening  results  from  the  heat  of  plastic  deformation  and  near- 
adiabatic  conditions.  Thermal  effects  are  insignificant  in  slow 
loading  and  thus  other  softening  mechanism(s)  must  be  involved. 

Metallographic  investigations*  of  sectioned  shear  specimens 
of  high  strength  4340  steel  hove  found  evidence  that  microscopic 
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voids  play  an  important  role  in  shear  banding.  Voids  were 
observed  at  debonded  grain  refinement  particles  (0.1  micron  size 
scale)  and  microcracks  were  found  linking  neighboring  voids.  It 
is  speculated  that  the  microcracks  develop  by  coalescence  of 
smaller  scale  voids  which  nucleate  at  strengthening  particles  as 
a  result  of  the  local  strain  intensification  of  the  dominant 
pair. 

In  an  attempt  to  elucidate  the  role  of  voids  in  the 
localization  event,  two-dimensional  plane  strain  elastic-plastic) 
analyses  were  conducted  to  establish  the  field  solution  between 
a  pair  of  interacting  voids  under  three  different  nominally 
uniform  strain  fields.  The  loadings  considered  were  simple 
shear,  uniaxial  tension,  and  combined  shear  with  extension. 
While  the  specified  kinematic  loadings  represent  nominally 
uniform  strain  fields,  in  the  vicinity  of  the  voids  the  stress 
and  strain  are  found  to  be  extremely  complex  with  substantial 
interaction  features. 

NUMERICAL  FORMULATION.  The  nonlinear  e] ast i c-pl as t i c  analysis 

2 

••me,  nr.rfnrfned  using  an  incremental  finite  element  formulation. 
Classical  non-hardening  Prandt 1-Reuss  constitutive  theory  was 
employed.  The  mesh  used  in  the  analysis  is  displayed  in  Figure 
1.  It  consists  mostly  of  quadrilaterals  each  subdivided  into 
four  linear  strain  triangles  by  its  diagonals.  As  drawn,  there 
are  approximately  3000  degrees  of  freedom  in  the  mesh.  Symmetry 
allows  a  single  quadrant  to  be  analyzed  in  extensional  loading, 
and  in  simple  shear,  one-half  of  the  mesh  is  sufficient. 

Loading  was  specified  by  displacement  conditions  on  the 
outer  boundary.  In  simple  shear,  the  boundaries  were 
constrained  against  motion  in  the  y-di recti  on ,  while  displace¬ 
ment  in  the  x-direction  was  specified  as  a  linear  function  of  y. 
In  uniaxial  extension,  the  boundaries  were  constrained  agnjnst 
displacement  in  the  x-direction,  while  y-di spl acement  was 
specified  as  a  linear  function  of  y.  The  combined  mode  loading 
considered  was  a  superposition  of  the  simple  shear  and  uniaxial 
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extension  kinematic  boundary  conditions.  Nominally,  the  simple 
shear  problem  involves  no  normal  strain,  while  the  uniaxial 
extension  problem  nominally  involves  no  shear  nor  contractional 
components  of  normal  strain. 

The  assumed  displacement  method  of  finite  element  analysis 
was  used  so  that  the  problem  at  each  load  step  was  to  find  the 
vector  of  displacement  increments  Au^.  If  we  consider 
components  i  =  l  through  m-1  as  the  specified  non-zero  boundary 

t 

values  and  denote  them  as  Au^  ,  the  stiffness  equations  for  the 
degrees  of  freedom  m  through  n  take  the  form: 


Kmm  ...  Kmn 

A  u 

m- 1 

-K  . 

i  i 

m 

-  2J 

mi 

Knm  . . .  Knn 

Aun 

i  =  l 

-K  . 
ni 

The  stiffness  matrix  J<  is  assembled  from  element  stiffness 
matrices  k  which  have  the  form 

k  r  J  BT  0  B  dv  (?) 

v 

where  follows  from  the  displacement  interpolation  function  and 
[)  is  the  incremental  constitutive  matrix  relating  stress  and 
strain  increments, 


Ac  =  B  A  u  (3) 

AO"  =  n  aj§. 

It  is  the  constitutive  matrix  £  which  is  the  source  of  the 
nonlinearity  of  the  formulation.  The  rate  form  of  the 

constitutive  law  has  0  dependent  upon  current  stress  state.  If 
the  current  state  is  within  the  yield-  surface,  then  £ 


represents  linear  elastic  behavior.  If  the  current  stress  state 
is  on  the  yield  surface,  then  £  represents  the  ability  of  the 
material  to  plastically  flow  in  the  direction  normal  to  the 
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yield  surface.  As  load  is  increased,  the  stress  distribution 
changes  and  the  zone  of  plasticity  expands.  Thus,  the  appropri¬ 
ate  D  at  a  point  continually  changes. 

In  our  numerical  analysis,  finite  values  are  necessarily 

* 

specified  for  the  increments  of  the  boundary  displacements  Aui  . 
Thus,  the  appropriate  D  at  each  grid  point  will  change  within 
the  load  increment.  An  implicit  approach  is  employed  whereby  a 
step  average  0  is  used  that  accounts  for  yielding,  stress 
changes  along  the  yield  surface,  and  possibly  unloading  within 
the  step.  The  solution  for  the  increments  Au^  is  gained  using 
a  successive  approximation  iterative  procedure.  At  each  iterate 
Au^,  an  average  0  is  formed,  if  necessary,  at  each  grid  point, 
as  follows: 


0av  =  f  Del  +  (1-f  )  ( De 1  -  2G  n  nT)  (5) 

—  e  —  e  —  —  — 

The  weighting  factor  fe  and  average  yield  surface  unit  normal  £ 

i  el 

are  defined  in  terms  of  Au^.  O'  is  the  elastic  constitutive 
matrix  and  G  is  the  elastic  shear  modulus.  The  vector  jn  is 
defined  so  that  the  stress  state  at  the  end  of  the  increment 
satisfies  the  yield  condition.  The  issue  of  stress  scaling  and 
associated  load  imbalance  common  to  tangent  approaches  is  not 
encountered  in  the  implicit  formulation. 

Regardless  of  formulation,  finite  load  steps  involve  a  load 
path  discretization  error.  This  error  was  controlled  by 
employing  an  adaptive  load  incrementation  procedure.  The 

size  of  the  load  increment  (scale  factor  adjusting  the  magnitude 

# 

of  specified  components  Au^  )  is  altered  during  the  iteration  to 
satisfy  a  condition  on  the  stress  solution.  The  condition 
dictates  that  the  maximum  deviatoric  stress  change  (along  the 
yield  surface)  accompanying  plastic  flbw  should  egual  a 
specified  fraction  of  the  uniaxial  yield  stress  Y.  In  the 
results  presented,  this  stress  change  fraction  was  specified 
equal  to  0.05.  Using  a  convergence  test  that  has  successive 
iterations  with  a  relative  difference  in  stress  increments  less 
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than  0.001,  typically  4  iterations  ore  found  necessary  at  each 
load  step.  Typical  solutions  which  trace  plasticity  from  first 
yield  to  general  yield  conditions  involve  approximately  50  load 
steps.  Hence,  it  is  common  to  perform  in  the  order  of  200 
solutions  to  the  stiffness  equation  (1)  in  a  particular  load 
case  analysis. 

NUMERICAL  SOLUTIONS.  We  discuss  here  details  of  the  solution 
for  the  void  pair  shown  in  Figure  1  under  three  different  modes 
of  imposed  strain.  In  each  of  the  three  analyses  conducted,  the 
solution  history  was  incrementally  traced  from  the  point  of 
first  yield  to  the  onset  of  general  yielding  throughout  the  band 
containing  the  voids. 

First,  we  consider  the  solution  for  the  case  of  imoosed 
nominal  shear.  General  yield  in  this  case  occurs  at  a  shear 
strain  slightly  below  the  yield  strain  value  (Y/>/T)/G.  The 
plastic  zone  development  is  illustrated  in  Figure  2.  Elements 
with  a  stress  state  satisfying  the  Mises  yield  condition  ere 
drawn  at  nominal  strain  levels  of  60,  77,  and  91°^  of  the  yield 
strain.  First  yield  occurs  at  the  surfaces  of  the  the  voids  at 
positions  roughly  45  degrees  from  the  ligament.  The  imposed 
strain  at  first  yield  was  found  to  equal  49?;  of  the  yield 
strain,  suggesting  an  elastic  concentration  factor  approximately 
equal  to  2.0.  As  the  plastic  zones  at  the  void  surface  grow,  a 
separate  distinct  zone  develops  along  the  ligament.  A  mechanism 
for  extensive  local  straining  is  possible  once  the  zones  link. 
Figure  3  illustrates  the  fact  that  on  the  ligament,  the  strain 
is  fairly  uniform  over  most  of  the  load  history.  However,  once 
general  yielding  conditions  are  achieved,  maxima  develop  at.  a 
distance  roughly  0.30  ahead  of  the  voids.  Figure  4  is  a  plot  of 
local  strain  (maximum  on  the  ligament)  divided  by  nominal  strain 
as  a  function  of  nominal  strain.  The  plot  demonstrates  the 
surge  in  local  straining  that  occurs  once*  the  plastic  mechanism 
is  established.  The  strain  intensification  rate  reaches  a  value 
in  excess  of  30  once  general  yielding  is  achieved. 
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In  the  case  of  uniaxial  extension,  the  nominal  stress  state 
is  triaxial  tension.  For  Poisson's  ratio  of  0.3,  which  was  the 
value  used  in  these  analyses,  yielding  in  the  band  without  voids 
occurs  at  the  uniaxial  strain  value  of  1.3  Y/E.  The  plastic 
zone  at  a  strain  level  of  1.27  Y/E  is  drawn  in  Figure  5a.  The 
fully  developed  plasticity  region  is  restricted  to  the  void 
legion  at  this  nominal  strain  level.  Figure  5b  shows  those 
elements  which  satisfy  the  near-isocloric  condition  which  hes 
the  magnitude  of  the  normal  strain  increments  differing  by  less 
than  63.  At  this  strain  level,  the  stress  data  along  the 
ligament  agrees  very  well  with  the  logarithmic  spiral  sliplin^ 
stress  distribution,  Figure  6.  Consistent  with  the  plastic  zon" 
development,  the  strain  maximum  is  at  the  void  surface  in  this 
problem  throughout  the  loading  history. 

Under  a  state  of  combined  extension  and  shear,  with  c 

’  nom 

equal  to  7  ,  yielding  occurs  when  the  two  strain  comoonpni  s 
reach  the  value  0.98  Y/E.  The  plastic  zone  and  near-isocloric 
region  are  drawn  in  Figure  8  for  a  strain  level  slightly 
exceeding  this  nominal  yield  value.  Tn  this  problem,  plastic 
7  n  n  e  grow  from  the  void  surfaces  slightly  skewed  from  the 
ligament.  The  separate  zones  from  the  voids  merge  by  th<> 
development,  of  a  narrow  plastic  region  between  them  oriented  in 
the  loading  direction.  The  normal  and  shear  stress  distribu¬ 
tions  are  provided  in  Figures  9  and  10.  As  qeneral  yi°ld 
conditions  are  reached,  the  maximum  normal  stress  is  seen  to 
shift  tn  the  center  of  the  ligament.  While  the  maximum  shear 
stress  is  at  or  near  the  ligament  center  throughout  loading,  its 
value  can  be  seen  to  at  first  increase  and  then  decrease  as 
general  yield  conditions  are  approached.  An  interesting  aspect 
of  the  f’nal  shear  stress  distribution  is  the  flat,  near-zero 
valued  regions  near  the  voids,  indicative  of  prevailing 
logarithmic  spiral  regions  there.  The  strain  distributions  for 
this  case  show  the  maximum  normal  strain  at  the  void  surfaces  at 
the  beginning  of  loading.  The  maximum  shifts  to  the  center  of 
the  ligament  once  general  yielding  is  achieved.  The  shear 
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strain  is  maximum  at  the  ligament  center  throughout  loading  in 
this  problem. 

SUMMARY .  The  numerical  solutions  suggest  thst  the  interaction 
between  voids  in  elastic-plastic  deformation  fields  depends 
strongly  upon  the  nominal  straining  mode.  It  can  be  readily 
inferred  that  the  nature  of  the  interaction  is  also  strongly 
dependent  upon  the  orientation  of  the  voids  with  respect  to  the 
principal  strain  directions. 

For  the  simple  shear  loading  (with  the  void  centerline 
aligned  with  the  shear  direction)  the  majority  of  the  ligament 
experiences  a  reasonably  similar  strain  history  with  magnitudes 
significantly  in  access  of  applied  strain.  The  minimum  strain 
region  is  at  the  void  surface  in  this  loading  case.  Cn  the 
contrary,  under  uniaxial  extension  normal  to  the  void 
centerline,  the  void  surface  experiences  the  maximum  strain 
levels  throughout  the  loading  history.  In  combined  loading,  the 
strain  distribution  was  shown  to  change  character,  first  from 
void  surface  straining  to  strain  maximum  at  the  ligament  center, 
once  tiie  plastic  zones  of  the  voids  merge. 
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FIGURE  2  -  Development  of  plastic  instability  between  void 

pair  under  simple  shear  Plastic  zones  at  nominal 

shear  strain  levels  60.  77,  and  91*  of  Y  .  ,  . 

yieia 

in 


*.•  M  1 .  •  l.S  2  •  •  2-5  3.* 


FIGURE  3  -  Shear  strain  distributions  on  ligament  between 
voids  under  nominal  simple  shear 


FIGURE  4  -  Strain  intensification  between  voids  under 
nominal  simple  shear 
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FIGURE  5  -  (A)  Plastic  zone  under  nominal  uniaxial  strain 

level  1.27  Y/E 

(B)  Fuliu  developed  near-isocloric  portion  of 
plastic  zone  shown  in  (A) 
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FIGURE  7 


6  “  Numerical  data  for  normal  stress  on  llaament  mi 
nominal  uniaxial  strain  1.27  Y/E 
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FIGURE  8  ~  (A)  Plastic  sons  undar  noalnal  combined  strain 

level  0.99  Y/E 

(B)  Fully  developed  near-isocloric  portion  of 
plastic  sone  shown  in  (A) 
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FIGURE  19 


Nominal  airiu  on  ligament  at  various  levels 
of  imposed  combined  strain 
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Shear  strain  on  ligament  at  various  levels 
of  imposed  combined  strain 


116 


i 


FRACTALS,  FRAGMENTATION,  AND  FAILURE 


Donald  L.  Turcotte 
Department  of  Geological  Sciences 
Cornell  University 
Ithaca,  NY  14853 


ABSTRACT .  Many  applied  problems  exhibit  a  fractal  character;  a  necessary 
condition  is  that  the  fundamental  phenomena  be  scale  invariant  over  a  reason¬ 
ably  wide  range  of  scales.  In  many  cases  a  renormalization  group  approach  can 
give  an  applicable  solution.  A  specific  example  is  fragmentation.  The  number- 
size  distribution  for  fragments  often  satisfies  the  power  law  fractal  relation. 
A  renormalization  group  approach  can  be  used  to  obtain  the  fractal  dimension 
associated  with  catastrophic  fragmentation.  The  renormalization  group  approach 
can  also  be  applied  to  the  failure  of  a  fractal  network.  A  gridded  network  can 
be  constructed  that  obeys  fractal  geometrical  constraints.  The  elements  of  the 
network  are  given  a  statistical  distribution  of  strengths.  Stress  transfer 
from  failed  elements  to  adjacent  sound  elements  is  an  essential  feature  of  the 
analysis.  The  renormalization  group  approach  specifies  a  catastrophic  failure 
criteria  for  the  network.  An  example  of  an  application  is  the  failure  of  a 
stranded  cable  that  has  been  constructed  according  to  fractal  constraints. 


L-..I 


<»*1<*J* 


UCTION .  It  is  recognized  that  there  are  a  variety  of  scale 
invariant  processes  in  nature;  the  concept  of  fractals  provides  a  means  of 
quantifying  these  processes.  A  fractal  distribution  can  be  defined  by 


N  -  r 


-D 


(1) 


where  N  is  the  number  (of  objects)  with  a  characteristic  linear  dimension 
greater  than  r,  and  D  is  the  fractal  dimension.  The  original  definition  of  a 
fractal  (1)  related  the  length  (perimeter)  P  of  a  coastline  (or  topographic 
contour)  to  the  length  of  a  yardstick  r  by 

P  -  Nr  -  N1'0  (2) 


Typical  coastlines  or  topographic  contours  have  D  -  1.2  -  1.3. 


2 .  FRAGMENTAT ION.  A  material  can  be  fragmented  in  a  variety  of  ways. 
Rocks  can  be  fragmented  by  joints  and  weathering.  In  this  case  the  distribu¬ 
tion  of  fragment  sizes  is  likely  to  be  determined  by  the  preexisting  planes  of 
weakness  in  the  rock.  Fragments  can  also  be  produced  by  explosives.  Again 
preexisting  planes  of  weakness  may  determine  the  distribution  of  fragment 
sizes.  Fragments  can  also  be  produced  by  impacts.  Impacts  are  likely  to  have 
played  a  dominant  role  in  determining  the  number-size  relation  for  asteroids 
and  meteorites. 

A  variety  of  statistical  distributions  have  been  used  to  correlate  the 
number-size  data  on  fragments.  However,  in  many  cases  a  power  law  relation  was 
determined.  Some  of  these  results  are  given  in  Figure  1.  Included  are  data 
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for  broken  coral  [2],  an  underground  nuclear  explosion  (3],  and  a  basalt  frag¬ 
mented  by  an  impacting  projectile  [4}.  Other  examples  have  been  tabulated  by 
Turcotte  [5]. 


FIGURE  1.  Dependence  of  the  number 
of  fragments  N  with  a  linear  dimen¬ 
sion  greater  than  r  on  r.  Data  for 
broken  [3],  and  basalt  fragmented  by 
an  impacting  projectile  [4]  .  The 
best  fit  fractal  dimension  D  defined 
by  (1)  is  given  for  each  example. 


The  applicability  of  a  fractal  distribution  indicates  that  fragmentation 
is  scale  invariant  over  a  wide  range  of  scales.  In  order  to  model  fragmenta¬ 
tion  we  will  use  the  renormalization  group  approach.  For  simplicity  we  will 
consider  a  cube  of  material  with  a  linear  dimension  h  as  illustrated  in  Figure 
2.  This  cube  is  referred  to  as  a  cell  that  is  divided  into  eight  cubic  ele¬ 
ments  each  with  a  dimension  h/2.  Attention  is  now  focused  on  one  of  the  cubic 
elements,  and  it  becomes  a  cell  of  dimension  h/2  at  order  1.  This  cell  is  then 
divided  into  eight  first-order  elements  each  with  a  dimension  h/4  as  illus¬ 
trated  in  Figure  2.  The  process  is  repeated  at  successively  higher  orders. 


FIGURE  2.  Illustration  of  the 
renormalization  group  approach  to 
fragmentation.  A  zero-order  cubic 
cell  with  dimension  h  is  divided 
into  eight  cubic  elements  with 
dimension  h/2.  Each  of  these 
elements  becomes  a  first-order  cell 
and  is  divided  into  eight  first- 
order  elements  with  dimensions  h/4. 
The  process  is  repeated  to  higher 
orders. 


The  basic  hypothesis  of  the  renormalization  group  approach  is  the  assump¬ 
tion  that  the  probability  that  a  cell  will  fragment  into  eight  elements  is  the 
same  at  all  orders.  If  initially  there  are  Nq  cubes  of  dimension  h,  the  number 
remaining  after  fragmentation  is  Nq'  -  (1-pc)Nq.  The  number  of  fragments  with 
dimension  hj^  -  h/2  is  -  8  Pc(1-Pc)Nq,  the  number  of  fragments  with  dimension 
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l»2  -  h/2^  is  N2  -  (8pc)^  (1-pc)Nq,  and  generalizing  the  number  of  fragments 
with  dimension  hn  -  h/2n  is  Nn  -  (8pc)n  (1-pc)Nq.  The  generalized  form  can  be 
written 

h 

In  ^  -  -  n  In  2  (3) 


N 

In  n  ln(8pc)  (M 


And  eliminating  n  gives 

ln(8pc) 

In  2 

(5) 

Comparison  with  (1)  gives 

ln(8p  ) 

n  _  '  »  ra\ 


Thus  the  fractal  dimension  D  is  directly  related  to  probability  that  a  fragment 
of  a  given  size  is  broken  into  smaller  elements.  The  probability  pc  is  depend¬ 
ent  on  the  specific  model  chosen  but  the  fractal  dimension  is  independent  of 
the  model. 

We  next  determine  a  specific  value  for  D  by  specifying  a  fragmentation 
model.  Following  All^gre  et  al.  [6]  each  element  in  a  cell  is  hypothesized  to 
be  either  fragile  or  sound.  It  is  necessary  to  determine  a  condition  for  the 
probability  that  a  cell  is  fragile  pn  in  terms  of  the  probability  that  an 
element  is  fragile  pn+i-  In  each  cell  there  can  be  zero  to  eight  fragile 
elements;  there  are  2*  -  256  possible  combinations.  Excluding  multiplicities, 
there  are  22  topologically  different  configurations. 

We  hypothesize  that  the  sides  of  a  fragile  element  form  planes  of  weak¬ 
ness.  If  the  sides  of  fragile  elements  form  an  internal  plane  through  the 
cell,  the  cell  is  assumed  to  be  fragile.  Examples  of  sound  and  fragile  cells 
are  given  in  Figure  3.  In  each  case  there  are  four  fragile  elements  (shaded). 
In  "a"  no  internal  planes  of  weakness  cut  the  cell,  so  it  is  sound;  in  "b"  both 
a  horizontal  and  a  vertical  plane  of  weakness  cuts  the  cell  and  it  is  fragile. 


FIGURE  3.  Each  cubic  cell  contains 
four  fragile  elements  (shaded)  and 
four  sound  elements.  In  "a"  the 
cell  is  sound,  and  in  "b"  the  cell 
is  fragile. 
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The  probability  P'  that  a  cell  is  fragile  is  related  to  the  probability  P 
that  an  element  is  fragile  by 


P„  -  3  P®+1  •  32  P3+1  ♦  88  -  96  p^  ♦  38  p^ 


The  dependence  of  pn  on  pn+i  is  given  in  Figure  4.  The  straight  line  pn  -  p 
is  also  included.  The  points  0  and  1  are  stable  fixed  points  of  the  system. n+1 
The  iterative  relation  crosses  the  straight  line  at  pn  -  pn+j  -  pc  -  0.490. 
This  is  an  unstable  fixed  point  that  separates  the  region  of  stable  behavior 
from  the  region  of  unstable  behavior.  The  critical  probability  pc  corresponds 
to  catastrophic  fragmentation  of  the  object.  Assuming  that  each  fragmented 
cube  breaks  into  eight  pieces  we  find  from  (6)  that  pc  -  0.490  corresponds  to 
D'-  1.97. 


FIGURE  4.  Probability  of  fragility 
at  order  n,  pn,  as  a  function  of  the 
probability  of  fragility  at  order 
n+1,  pn+i,  from  (7).  The  critical 
probability  pc  corresponding  to 
catastrophic  fragmentation  is  0.490. 
The  corresponding  fractal  dimension 
from  (6)  is  .1.97. 


3.  NETWORK  FAILURE.  The  renormalization  group  technique  can  also  be 
applied  to  the  failure  of  a  fractal  network  [7,8].  The  network  is  modeled  as 
the  fractal  tree  illustrated  in  Figure  5.  Failure  is  associated  with  the 


application  of  a  vertical  load  V  to  the  tree.  Each  branch  is  given  a  random 
strength  such  that  the  probability  of  failure  of  the  branch  under  load  v  is 


given  by  a  Weibull  distribution 


Pi  -  1  -  exp[-(v/v0)2) 


(8) 


where  vq  is  a  reference  load.  For  each  pair  of  branches  the  probability  of 
failure  of  both  is  P^2 ,  one  is  2  Pi(l-Pi),  and  neither  is  (l-P^)2. 


FIGURE  5.  Illustration  of  a  fractal 
tree.  The  height  of  the  n+h  level 
is  hjj  -  hi/2n'*;  the  number  of 
branches  at  the  n+h  level  is  2n. 
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However,  we  hypothesize  that  if  one  branch  fails,  the  load  is  transferred 
to  the  adjacent  branch  in  the  pair.  The  second  branch  nay  suffer  induced 
failure  due  to  this  transfer.  Ve  use  the  conditional  probability  P2  \ ,  that  an 
unbroken  branch  already  supporting  a  load  v  will  fail  when  an  additional  load  v 
is  transferred  to  it  fron  an  adjacent  broken  branch.  The  probability  that  a 
pair  will  fail  is  given  by 

n'lpl  -  nPx2  +  2  nPx  (l-np!)  nP2  1  (9) 


The  conditional  probability  is  given  by 


1  -nP} 


(10) 


where  np2  is  the  probability  of  failure  under  load  2v.  For  the  second  order 
Veibull  distribution  given  in  (8)  we  have 


nP2  -  l-d-*^)4  (11) 

Combining  (9),  (10),  and  (11)  gives 

-  2?x  [l-U-nPi)4]  -  "Pi2  (12) 


This  is  a  failure  condition  for  a  pair  of  branches. 


It  is  implicit  in  the  renormalization  group  approach  that  the  failure 
condition  (12)  is  applicable  at  all  levels  of  the  fractal  tree.  The  dependence 
of  n+lpj  on  npi  is  given  in  Figure  6.  Again  the  characteristic  S- shaped  curve 
is  obtained,  and  the  straight  line  n'^P^  -  nP]_  is  included.  Two  stable  points 
are  0  and  1  and  the  unstable  fixed  point  is  nP^  -  -  Pj*  -  0.2063.  The 

corresponding  value  of  the  critical  load  from  (8)  is  V*  ~  0.4806Vq.  It  is 
interesting  to  note  that  this_critical  load  is  considerably  less  than  the  mean 
strength  of  a  branch  that  is  v  -  0.8862v  from  (8). 


FIGURE  6.  Probability  of  failure  of 
a  branch  at  the  n-1  level  n‘^Pi,  as 
a  function  of  the  probability  of 
failure  at  the  n  level,  nPj,  from 
(12).  The  critical  probability  P* 
corresponding  to  catastrophic 
failure  is  0.2063. 
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Abstract 


In  this  article  we  present  a  brief  sum¬ 
mary  of  the  numerical  solution  of  a  system 
of  random  Volterra  integral  equations.  The 
methods  we  use  are  (i)  Newton's  method  and 
(ii)  successive  approximation  method.  Based 
on  the  simulation,  we  discuss  the  mean  and 
variance  of  the  solution  of  a  system  of  random 
Volterra  integral  equations . 
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1. 


INTRODUCTION 


The  study  of  random  Volterra  integral  equations  and  their 
applications,  is  an  active  area  of  research  in  probabilistic 
analysis.  However,  the  numerical  treatment  of  a  system  of 
random  Volterra  integral  equations  is  yet  to  be  explored. 

For  a  recent  survey  of  approximate  solution  of  random  inte¬ 
gral  equations  we  refer  to  Bharucha-Reid  and  Christensen  [4] . 
For  the  numerical  treatment  of  random  integral  equations  we 
refer  to  Bharucha-Reid  [3,5],  Becus  [2],  Christensen  and 
Bharucha-Reid  [6],  Lax  [8,9,10],  Medhin  and  Sambandham  [11,12], 
Sambandham  [14,15],  and  Tsokes  and  Padgett  [16].  Among 
other  methods  of  successive  approximation,  stochastic  approxi¬ 
mation  methods,  method  of  moments,  method  of  averaging, 
projection  method,  Newton's  method,  etc.  are  used  to  obtain 
the  numerical  solution  of  random  integral  equations.  Most 
of  the  results  in  the  literature  are  linear  or  one  dimen- 
sinal  equations.  For  the  numerical  treatment  of  deterministic 
integral  equations  we  refer  to  Baker  [1] . 

In  this  article  we  consider  a  system  of  random  Volterra 
integral  equations.  By  an  application  of  successive  approxi¬ 
mation  method  and  Newton's  method,  we  examine  the  method  of 
obtaining  the  numerical  solution  of  a  system  of  random  Volterra 
integral  equations.  We  organize  our  article  as  follows. 

By  an  application  of  Newton's  method  and  successive  approxi¬ 
mation  method,  wo  develop  useful  numerical  procedures  respec¬ 
tively  in  Sections  2  ana  3.  In  Section  4  we  include  a  short 
discussion. 

2 .  NEWTON ' S  METHOD 

In  this  section  we  use  Newton's  method  to  obtain  the 
numerical  solution  of  a  system  of  random  Volterra  integral 
equations . 

Let  (ft,F,P)  be  a  complete  probability  space  and  let  X  be 
a  separable  Hilbert  space  with  ■inne. 4.  product  (*,*).  A  mapping 
T:  fl  x  X  -*■  X  is  said  to  be  Aundom  if  and  only  if  the  function 
<T(w)x,y>  is  a  scalar  valued  random  variable  for  every 
x,y  c  X.  In  other  words,  T(u>)  is  a  random  opcAa-toA  if  and 
only  if  T(uj)x  is  an  X -valued  random  variable  for  every  x  e  X. 

A  random  operator  is  1-ineaA  if  T(tu)  (ax^  +  8x2)  =  aT(u>)x^  + 

8T(a>)X2  a.s.  for  every  x^,x2  e  X  and  scalars  a, 6.  T(w)  is 

said  to  be  bounded  Aandom  opeAatox  if  there  exists  a  non¬ 
negative  real-valued  random  variable  M(w)  such  that  for  all 
x^,x2  e  X,  ||T(w)  x1-T(w)x2  ||  s  M(w)  ||x^-x2||  a.s.  If  T(w)  is 

a  bounded  random  operator  on  X,  then  T” 1  (oo)  is  a  random 
operator  which  maps  T(u)x  into  x  a.s.  T(u>)  is  said  to  be 

inveM'Lble  if  t”*(w)  exists.  If  T(w)  is  an  invertible  random 
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operator  which  is  bounded,  then  T~*(go)  is  a  bounded  random 
operator  also. 

Now  let  us  consider  the  random  operator  equation 
T(u)) x (go)  =  x(w)  ,  where  T(co)  :  ft  *  X  -+■  X  is  a  random  nonlinear 
operator.  For  our  purpose  it  is  enough  if  T(oo)  is  defined  on 
an  open  set  U  of  X.  Let  T(co)  be  continuously  differentiable 
a.s.  and  let  xn(co)  :  ft  -*■  U  be  an  X-valued  random  variable  such 

u  _i 

that  [I-T'(go)Xq]  :  !1  x  X  +  X  be  defined  and  bounded,  where 

T'  (oo)xn  is  random,  it  follows  from  the  theorems  of  Hans  [7] 

0  -1 
and  Nashed-Salehi  [13]  that  [I-T1  (go)  Xg  (go)  ]  is  a  random 

bounded  linear  operator  . 

Let  k(Go)  :  ft  -*•  (0,1)  be  a  real  valued  random  variable; 
and  let  Tq^(go)  be  a  bounded  linear  random  operator  such  that 
||  Tq*  (go)  [I-T'  (go)  x-  (go)  ]  —  1 1|  s  k(u>)  <  1..  We  denote  by 
B(xQ,r)  the  collection  of  all  X-valued  random  variables  x(co) 
such  that  x(go)  e  U  and  ||  x(go)  -Xq  (go)  ||  ^  r.  Then  if 

|| T"1  (go)  [I-T(w)  ]xq(go)  ||  <  r (l-k(co) )  , 

there  exists  random  variable  x(co)  e  B(xQ,r)  such  that 

T(go)x(go)  =  x(go)  a.s.  (that  is  x(go)  is  a  xzd  poiai  of  T(go)), 
and  the  sequence  of  X-valued  random  variables  defined  by 

xn+i (w)  *  xn(u)  -  Tj1(w)  [I-T(u) ]xn(u) ,  (2.1) 

n  =  0,1,2,...,  converges  to  x(gj)  a.s.  For  further  discussion 
on  this  line  we  refer  to  [5,12]. 


As  an  application,  we  implement  the  above  method  to  the 
following  system.  Consider 

t 

fj^tiGO)  *  g1(t,U3)  +  I  k1(t,T,G0)M1(T,f1(T,G0)  ,f2(t,Gj)  ,  ( GO )  )  dl 

(2.2) 

t 

f2(t,G0)  =  g2(t,G0)  +  f  k2(t,T,G0)M2(T,f1(T,G0)  ,f2(T,G0)  ,V2(go)  )dT, 

where  the  solution  vector  is  ( f ^ ( t , go)  ,f2(t,Go) )  and  the  functions 
g^,  k^,  i  *  1,2  are  assumed  to  be  well  defined  so  that 
(f^  (t,Go)  ,  f2  (t ,co) )  exists  with  probability  1.  Let 

r*(t,Go)  -  fj(t,w)  -  /  k^  ( t,  x ,«)  (t  ,  (t  ,oj)  ,  f2 (t  ,go)  ,v^ (u)  )  di 

—  gi(t,co)  ,  x  =  1,2,  k  —  0,1, 2, 3, . .  •  (2.3) 
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e£(t,u>)  -  /  (k^  ( t  ,x  #oj)  3y  (x  ,  f^  (t  ,u>)  ,  (x  ,w) ,  (w) )  e*  (x ,  w) 

+  k1(t/x#u»)3v  (  t  ,  (t  #at)  ,f2(x,w)  ,v1(w)  )e2(x,w)  ]dx 


r*(t,w) 


(2.4) 


e2(t,uj)  -  /  [k2(t,T  #w)  3v  M2  (x,f^(T  ,to) /^(i/U))  ,  v2(w) )  e^(t,oj) 

+  k2(t,T,U))3v  M2(x  ,f*£  (x  ,u>)  ,  f^  (x  ,oj)  ,v2  (w) )  e2(x  ,w)  dx 


=  r2(t,aj) 


(2.5) 


where 


k  «=  0,1,2,... 

3 


v.Mi(t'vl'v2,v)  =  -5VT  Mi  ( ' vi ' v2 , v) 

**• 

M.  (t, 
v  1  ' 

2 

i  =  1,2. 


^v2^i  ^'V1'V2 ' *  SVJ  Mi(t.v1(v2,v) 


Now  we  set 
ck+l 


fj  * (t,w) 


^k+l..  . 

f2  (t,oi) 


I  k  =  0,1,2,... 


(2.6) 


R  R 

According  to  Newton's  iteration  (f^(t,u>)  ,f2(t,w) )  converges  to 
the  solution  of  (2.2). 

For  numerical  procedure,  integrals  in  (2. 3) -(2. 5)  can  be 
evaluated  by  any  suitable  numerical  procedure,  for  example, 
collocation  or  quadrature  method. 

Example  2.1:  In  the  following  example  we  illustrate 
the  above  procedure .  Let 
t  2 

f1(t)  +  /  t(l-x)  lfJ(T)+f2(x)+v(u>)  ]dx  =  g^t,^), 


f2(t)  +  /  (t-x)  (f1(x)+f2(x)+v(w)  ]dx  =  g2(t,u>). 
Then  (2. 3) -(2. 5)  reduce  to 
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( t  ,u>) 

r2(t,u) 


-  fj(t) 


/  t(l-x)lf*  ( i )  +f  2+v  (w)  ]  di  -  gx  (t,w) 


(2.8) 

k  k  k^ 

=  f  2  ( t )  +  /  ( t—  t  )  [  f  *  ( x )  +  f  2  (T)+v(u>)]di  -  g2(t,w) 

2  t3  t4  t5 

(t,ui)  =  t  +  t  -  -j-  +  -j-  +  +  R(m) 

t2  t3 

g2  ( t  ,u>)  -  1  +  ~2~  +  “g-  +  R(uj) 


c^(t,w)  -  /  {2t  (1-x)  f*  (x  ,w)  e*  (t,w) +t(l-x)  e*  (x  ,w)  }dx  =  r*(t,u)) 


c2(t,w)  -  /  { (t-x)  e*(x  ,w) +2(t-x)  f2(x  ,w)  e2  (x  ,u>)  }dx  =  r2(t,u) 

(f°(t  ,u))  ,f2(t  ,oj)  )  =  (R(w),l+R(u>))  (2.10) 

Using  (2.6),  (2.8) — (2.10)  the  numerical  solution  of  (2.7)  can 
be  obtained. 

2 

Let  N(m,o  )  denote  normal  distribution  with  mean  m  and 
2 

variance  o  and  U(a,b)  denote  uniform  distribution  in  the 
interval  (a,b) .  In  our  numerical  experiment  we  have  taken  as 
follows : 


Example 

2.1 

(a)  : 

V(0i) 

€  N (0  ,  .0022)  ,  R(u>) 

Example 

2.1 

(b)  : 

v(u>) 

c  N  (0  ,  .  022)  ,  R(u») 

Example 

H 

• 

CM 

(c)  : 

v(ui) 

e  N  (0  ,  .02^)  ,  R(u>) 

Example 

2.1 

(d)  : 

v  (w) 

e  N(0 , . 22) ,  R(u)  c 

c  U(-. 001, .001) 
e  U(-.01,.01) 
c  U(-.l,.l) 
U(-.l, .1) . 


We  used  trapezoidal  rule  in  (2.8)  and  (2.9)  for  numerical  inte¬ 
gration.  Our  simulation  results  are  presented  in  Tables  2.1  - 
2.4  and  Figures  2.1  -  2.4.  These  results  are  based  on  30 
samples  and  in  each  sample  iteration  was  repeated  until 

|fi+1(t,a>)-fj(t,u>)  |  <  .001,  i  =  1,2.  In  Figures  2.1  -  2.4, 

oooo  and  ••••  denote  respectively  E(f^(t,w))  and  E(f2(t,u>)) 

and  ••••  and  -  denote  repsectively  V(f1(t,w))  and 

V(f 2  ( t,u>) )  . 
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Table  2.1 


v(u>)  e  N(0 ,  0 

. 0022)  ,  R(oj) 

e  U(-0. 001, 0.001) 

t 

E(f1 ) 

v(fx) 

E(f2) 

v(f2) 

1 

.009964 

.000004 

.999676 

.000004 

10 

.099967 

.000004 

.999681 

.000004 

20 

.199975 

.000004 

.999693 

.000004 

30 

.299988 

.000003 

.999712 

.000003 

40 

.400006 

.000003 

.999739 

.000003 

50 

.500026 

.000003 

.999774 

.000002 

60 

.600036 

.000003 

.999800 

.000002 

70 

.700045 

.000003 

.999827 

.000001 

80 

.800039 

.000003 

.999829 

.000001 

90 

.900031 

.000003 

.999811 

.000001 

100 

1.000028 

.000003 

.999764 

.000001 

Table  2.2. 

v(uj)  €  N ( 0 , 0 

. 022)  ,  R(u>)  e  U(-0.01,0.01) 

t 

E(fx) 

v(fx) 

E(f2> 

v(f2) 

1 

.009639 

.000361 

.996755 

.000388 

10 

.099666 

.000360 

.996789 

.000380 

20 

.199741 

.000355 

.996884 

.000357 

30 

.299841 

.000344 

.997024 

.000319 

40 

.399931 

.000329 

.997175 

.000272 

50 

.499988 

.000310 

.997310 

.000218 

60 

.599987 

.000290 

.997383 

.000163 

70 

.699914 

.000271 

.997331 

.000114 

80 

.799772 

.000256 

.997067 

.000079 

90 

.899621 

.000244 

.996521 

.000064 

100 

.999542 

.000240 

.995575 

.000082 
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Table  2.3. 


v  (u>)  e 

N(0,0.02)  ,  R(u) ) 

e  U(-.l,.l) 

t 

v(f1) 

E(f2) 

v(f2) 

1 

.009639 

.000361 

.996755 

.000388 

10 

.099663 

.000360 

.996785 

.000380 

20 

.199736 

.000355 

.996876 

.000357 

30 

.299834 

.000344 

.997012 

.000319 

40 

.399922 

.000328 

.997160 

.000272 

50 

.499974 

.000310 

.997288 

.000218 

60 

.599971 

.000290 

.997355 

.000164 

70 

.699886 

.000271 

.997282 

.000115 

80 

.799745 

.000255 

.997014 

.000079 

90 

.899590 

.000244 

.996455 

.000065 

100 

.999513 

.000239 

.995503 

.000083 

Table  2.4 

v  (to)  e 

N{0  /  0 . 2)  ,  R(u>)  c 

U(-.l, .1) 

t 

E(fx) 

v(fx) 

E(f2) 

v(f2) 

1 

.006387 

.036141 

.967551 

.038808 

10 

.096353 

.036049 

.967691 

.038019 

20 

.196225 

.035588 

.968063 

.035683 

30 

.295926 

.034612 

.968529 

.031979 

40 

.395261 

.033163 

.968807 

.027200 

50 

.494169 

.031373 

.968641 

.021754 

60 

.592422 

.029403 

.967498 

.016252 

70 

.690021 

.027502 

.964827 

.011429 

80 

.787226 

.025913 

.960106 

.008192 

90 

.884353 

.024770 

.952363 

.007708 

100 

.982290 

.024262 

.940524 

.011188 
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Fig.  2.3:  Example  2.1  (c) 


Fig.  2.4:  Example  2.1  (d) 


3.  SUCCESSIVE  APPROXIMATION 

In  this  section  we  use  successive  approximation  for  the 
numerical  treatment  of  a  system  of  random  Volterra  integral 
equations.  In  the  following  we  state  a  few  useful  results  from 
Tsokes  and  Padgett  [16].  Let  Cg  be  the  space  of  all  continuous 

functions  from  R+  into  L2(Q,A,P) ,  where  (fl,A,P)  is  a  proba¬ 
bility  space,  such  that 

||x(t,k>)  ||  =  {/|x(t,u>)  |2dP(w)  }1/2  <  zg(t)  , 
ft 

where  t  £  R+,  z  z  0  and  g  is  continuous  function  on  R+.  Let 

Cc  be  the  space  of  all  continuous  function  from  R+  into 

L2(ft,A,P)  with  the  topology  of  uniform  convergence  on  the 

interval  [0,T]  for  any  T  >  0.  Consider  the  random  integral 
equation 

t 

x(t,w)  =  h(t,ui)  +  /  k  (t  ,t)  f  (t  ,x( t  ,w)  )  dt ,  (3.1) 

0 

where  the  following  hypothesis  hold. 

(i)  The  function  (t,x)  k(t,x)  from  the  set  A  =  {(t,x), 
0ST^t<°°}  onto  R  is  continuous. 

(ii)  There  exists  a  number  A  >  0  and  a  continuous  function 
(on  R  )  g ( t)  >0  such  that 
t 

/  |k(t,w) |g(x)dx  £  A,  t  e  R,. 

0  + 

(iii)  f(t,x)  is  a  continuous  vector-valued  function  for 
t  e  R+,  ||x(t,w)  ||  s  p,  such  that  f(tr0)  e  Cg  and 

||  f  (t,x(t,w) ) -f  (t,y(t,u) )  ||  *  Ag(t)  ||x(tfu)  -y  (t ,w)  || 

where  A  >  0 . 

(iv)  h(t,w)  is  a  bounded  continuous  function  on  R+  with 
values  in  L2(ft,A,P). 

Then  there  exists  a  unique  random  solution 
x(t,u>)  «  Cc(R+,L2(fl,A,P)  of  (3.1)  such  that 

||x(t,u>)  ||  =  sup{/|x(t,uj)  |2dP(o)  }1/2  s  p  (3.2) 

c  tzO  n 

for  t  c  R  ,  as  long  as  ||h{t,u>)  ||  ,  A  and  ||f(t,u»)  ||  are  small 
*  c  y 

enough.  For  detail  proof  refer  to  [11,  16].  The  method  we 

use  is  Picard  type  successive  approximation.  We  use  a  modified 

version  of  [16]  to  deal  with  systems  and  applied  the  results 
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to  write  a  discrete  version  of  the  successive  approximation. 


We  apply  the  method  in  the  following  example  to  obtain 
the  numerical  solution  of  a  system  of  random  Volterra  integral 
equations . 

Example  3.1:  Let 


1  — T 

x^tfO))  =  a1(t,w)  +  ^  /  xe  [x2  (x  ,u>) +sin  x^XfWjJdx, 


(3.3) 


1  u  X— t 

x2(t,w)  =  a2(t,w)  +  ^  /  e  [x1(t,w)+cos  x2(x,w)]dx, 
where 

a^ (t,w)  “  t  -  ^  (l+r2(a)) )  t2e_t  +  e_tt  cos ( tH-^ (cu)  ) 

-  e  sinCt+r^^  (<*>))  +  e  t  sin  r^  (w)  +  r^  (u>)  , 
a2(t,a>)  =  (t+r1(w))  -  (1-^  (u>) )  e_t  -  £  cos  (l+r2  (oj)  ) 

+  cos  (l+r2  ( co ) )  e  t  +  r2 (u) . 


By  discretization  (3.3)  can  be  written  as 

lr4-l  1  ^ _  ^  k  k 

x*  (Tn,0))  =  a1(Tn,(jj)  +  y  Z  x^e  n[x2(ii,u))+sin  xj(xi#w)]  x 


(Ti+i*Ti) 


(3.4) 


k  +  1  1  Ti”Tn  k  k 

x2  i(Tn  ,  u>)  =  a2(xn,w)  +  J  Z  e  n  [x*  (x.^  ,w)  +cos  x^t^w)]  x 


2 '  n ' 


llv  1 


‘21  li 


k  ~  )/lf2f... 

Theoretical  solution  of  (3.3)  is  (ti+r.  (w)  ,l+r9 (w) )  .  (3.4) 

k+l  r  *  .  ^ 

can  be  simulated  until  |x^  ( t , w) -x* (t,o>)  |  <  e,  i  =  1,2. 

For  our  numerical  experiment  we  have  assumed  as  follows: 

2 

Example  3.1  (a):  r^w)  ,r2(w)  c  N(0,.002  ). 

2 

Example  3.1  (b) :  r^(w),r2(w)  «  N(0,.02  ). 

2 

Example  3.1  (c)  :  r^w)  ,r2(w)  e  N(0,.2  ). 
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Example  3.1  (d)  ;  ^ (w) , r2 (w)  e  U(-. 001,  .001) 

Example  3.1  (e)  ;  r1  (oj)  , r2 (oj)  c  U(-.01,.01) 

Example  3.1  (f):  (u>)  ,r2 (u>)  €  U(-.l,.l). 

We  have  presented  our  simulation  results  in  Tables  3. 1-3. 6  and 
Figures  3. 1-3. 6.  These  results  are  based  on  30  samples  and 

each  iteration  was  repeated  until  |x*+1(t,w)  -x*(t,uj)  |  <  .001, 

i  =  1,2.  In  Figures  3. 1-3. 6,  oo©o  and  ••••  denote  respec¬ 
tively  E(x^(t,u))  and  E(x2(t,w))  and  ••••  and  -  denote 

respectively  V(x1(t,w))  and  V(x2(t,u>)). 

4.  DISCUSSION 


The  foregoing  techniques  demonstrate  the  usefulness  and 
simplicity  in  applying  Newton's  and  successive  approximation 
methods  to  a  system  of  random  Volterra  integral  equations. 

Our  numerical  results  show  that  mean  of  the  sample  solutions 
converge  to  the  mean  of  the  theoretical  solution  when  variance 
decreases. 


Other  statistical  parameters  one  can  look  at  is  risk 
functionals,  confinement  probability,  skewness,  kurtosis, 
correlation,  etc.  of  random  solutions.  Also  distribution  of 
the  numerical  solution  at  different  time  units  may  be  of 
interest. 


« 
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Table  3.1 


(oj ) 

,r2(w)  c  N(0,0 

.  0022) 

t 

E(x2) 

V(x2) 

1 

.009867 

.000030 

1.009701 

.000002 

10 

.097103 

.000031 

1.001172 

.000004 

20 

.197879 

.000032 

1.001358 

.000005 

30 

.300248 

.000029 

1.001692 

.000005 

40 

.404133 

.000029 

1.001025 

.000004 

50 

.508920 

.000027 

1.007050 

.000004 

60 

.615807 

.000033 

1.007915 

.000005 

70 

.724170 

.000029 

1.007578 

.000004 

80 

.833203 

.000037 

1.007702 

.000006 

90 

.943027 

.000028 

1.007992 

.000005 

100 

1.052344 

.000031 

1.008623 

.000004 

Table  3.2 

r1(w) 

,r2(u)  e  N(0,0. 

022) 

t 

E(Xl) 

V(xx) 

E(x2) 

V(x2) 

1 

.008556 

.000357 

1.018219 

.000181 

10 

.098695 

.000656 

.999261 

.000364 

20 

.197177 

.000569 

1.000663 

.000513 

30 

.300128 

.000444 

1.003236 

.000471 

40 

.404506 

.000341 

.995288 

.000448 

50 

.509057 

.000333 

1.004797 

.000441 

60 

.615514 

.000486 

1.014265 

.000542 

70 

.725338 

.000466 

1.009957 

.000383 

80 

.832995 

.000515 

1.008462 

.000647 

90 

.942605 

.000335 

1.006821 

.000461 

100 

1.044258 

.000323 

1.006896 

.000408 
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Table  3.3 


r^w)  ,r2(ui)  c 

N(0,0.22) 

t 

V(xL) 

E(x2) 

V(x2) 

1 

-.004550 

.046929 

1.012795 

.009134 

10 

.114598 

.060710 

.980120 

.036398 

20 

.190134 

.050747 

.993800 

.051145 

30 

.298866 

.040816 

1.018676 

.046974 

40 

.408112 

.030350 

.937967 

.044455 

50 

.510469 

.032418 

.981624 

.043917 

60 

.612422 

.043692 

1.077264 

.054399 

70 

.736662 

.046105 

1.032924 

.038294 

80 

.830384 

.043630 

1.015674 

.064831 

90 

.937779 

.032423 

.994332 

.045982 

100 

.962477 

.045704 

.988586 

.047095 

Table  3.4 


r^w) 

,r2<w)  c  U(-0 

.002,0.002) 

t 

Efx^ 

Vfx^ 

e(x2) 

V(x2) 

1 

.010000 

.000024 

1.009160 

.000000 

10 

.096964 

.000023 

1.001347 

.000000 

20 

.197954 

.000024 

1.001442 

.000000 

30 

.300235 

.000024 

1.001595 

.000000 

40 

.404107 

.000025 

1.001491 

.000000 

50 

.508935 

.000025 

1.007220 

.000000 

60 

.615796 

.000027 

1.007402 

.000000 

70 

.724064 

.000026 

1.007389 

.000000 

80 

.833179 

.000028 

1.007598 

.000001 

90 

.943050 

.000027 

1.008064 

.000000 

100 

1.052986 

.000027 

1.008703 

.000009 
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Table  3.5 


r^w) 

,r2(uj)  e  U(- 

.02,. 02) 

t 

E(x1) 

V(xx) 

E(x2) 

V<*2> 

1 

.009886 

.000067 

1.002753 

.000018 

10 

.097296 

.000070 

1.001005 

.000032 

20 

.197930 

.000068 

1.001496 

.000040 

30 

.299994 

.000065 

1.002268 

.000038 

40 

.404240 

.000057 

.999945 

.000038 

50 

.509209 

.000047 

1.006492 

.000041 

60 

.615406 

.000063 

1.009133 

.000044 

70 

.724278 

.000061 

1.008070 

.000035 

80 

.832759 

.000075 

1.007418 

.000052 

90 

.942839 

.000056 

1.007543 

.000039 

100 

1.050695 

.000065 

1.007700 

.000030 

Table  3.6 


r-j^w)  ,r2(w)  e 

U(-.2,.2) 

t 

E(x^) 

V(xx) 

E(x2) 

V(x2) 

1 

.008749 

.003090 

.951389 

.003010 

10 

.100586 

.004457 

.997168 

.003305 

20 

.197686 

.003644 

1.002044 

.004044 

30 

.297588 

.003728 

1.008994 

.003835 

40 

.405573 

.002929 

.984492 

.003804 

50 

.511933 

.002624 

.999228 

.004051 

60 

.611481 

.003274 

1.026460 

.004380 

70 

.726380 

.004078 

1.014866 

.003524 

80 

.828506 

.003568 

1.005627 

.005222 

90 

.940660 

.003556 

1.002309 

.003895 

100 

1.027694 

.003709 

.997618 

.004123 
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APPROXIMATE  METHODS  FOR 
STRUCTURAL  RELIABILITY 
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Department  of  Structural  Reliability 
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Ithaca,  N.Y.  14850 


ABSTRACT.  Structural  reliability  can  be  defined  as  the 
probability  that  a  random  process  X(t)  remains  in  a  domain  of 
safe  structural  performance  during  a  reference  period.  The 
process  can  model  material  properties,  environmental  loads,  or 
outputs  of  mechanical  systems  subject  to  random  inputs.  In  the 
time- independent  case  £(t)  =  X  and  reliability  can  be  approximated 
from  a  scalar  quantity,  the  reliability  index.  This  approximation 
is  evaluated  for  Gaussian  and  non-Gaussian  vectors  and  safe 
domains  of  various  shapes.  In  the  time-dependent  case 
reliability  is  approximated  by  the  mean  rate  at  which  jX(t) 
crosses  out  of  the  safe  domain.  When  £(t)  is  non-Gaussian  it 
can  be  approximated  by  a  memory  less  transformation  of  a  Gaussian 
process,  called  a  translation  process.  Translation  processes 
have  identical  marginal  distributions  and  similar  crossing 
properties  as  the  original  process  X(t).  The  approximate  method 
of  reliability  analysis  based  on  translation  processes  is 
applied  to  several  non-Gaussian  processes  and  safe  domains. 

I . INTRODUCTION .  Consider  a  structural  component  of  strength 
Xg  subject  to  an  uncertain  axial  load  of  X^.  The  reliability  of 

the  component  is  equal  to  the  probability  Pg  =  PfX^  £  Xg|  and  can 

be  determined  from  the  probability  content  of  the  safe  domain 
D  =  I (x^,Xg) :x^-Xg^0| .  The  component  fails  with  the  probability 

Pp  =  1— Pg.  In  general  reliability  problems  the  vector 

X  =  (X^,Xg)  of  uncertain  parameters  is  n-dimensional  and  can  be 

time-dependent.  The  safe  domain  D  «  |x:  g(x)^0|  is  a  region  in  Rn 
and  the  boundary  5D  =  fx:  g(x)=0|  of  D  is  usually  refered  to  as 
the  limit  state. 

The  objective  of  this  paper  is  to  examine  and  evaluate 
approximate  methods  for  calculating  the  reliability  Pg  in  general 

reliability  problems  involving  t ime- invariant  and  time-dependent 
random  vectors  X-  It  is  assumed  in  the  analysis  of  time- 
dependent  problems  that  failure  occurs  at  the  first  excursion  out 
of  the  safe  domain.  Thus,  failures  due  to  changes  in  material 
characteristics  under  constant  stress  or  damage  accumulation 
caused  by  repeated  loads  are  not  investigated. 
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II. TIME  INVARIANT  PROBLEMS.  First  let  X  be  a  Gaussian 
vector.  Without  loss  of  generality  it  is  assumed  that  X  has 
independent  components  of  zero  mean  and  unit  variance..  The 
reliability  is 

PS  =  dx  (1) 

in  which  <p(x)  =  (2n)  n^2exp|-.X£x2i .  The  determination  of  Pg  in 

Eq.  (1)  usually  requires  numerical  integration  and,  as  a  result, 
is  impractical  when  n£3.  However,  the  reliability  can  be 
obtained  in  closed  form  in  two  cases  corresponding  to  safe 
domains  bounded  by  hyperplanes  and  hyperspheres.  It  is  *(0)  for 
linear  limit  states  at  a  distance  0  from  the  origin  and  (0) 

n 

for  spherical  limit  states  with  radius  0  centered  at  the  origin. 

*  and  F^  denote,  respectively,  the  standard  Gaussian 
n 

distribution  and  the  chi  distribution  with  n  degrees  of  freedom. 

The  reliabilities  corresponding  to  linear  and  spherical 
limit  states  can  be  employed  to  develop  probability  bounds  for 
general  domains.  Consider,  for  example,  the  safe  domain  D  in 
Figure  1  and  let  £q  be  the  point  of  dD  closest  to  the  origin, 

assumed  to  be  unique.  This  point  is  usually  refered  to  as  the  0- 
point  and  is  at  a  distance  0  -  |;&q|  from  the  origin.  If  D  is 

convex  and  the  function  g(x)  can  be  differentiated,  the 
reliability  Pg  is  bounded  by 

fx  0>)  s  ps  <;  *(*)  (2) 

n 

The  bounds  F^  and  *(0)  on  Pg  are  attractively  simple.  However, 
n 

they  become  less  informative  as  the  dimension  of  the  space  n 
increases, as  shown  in  Figure  2.  Therefore,  other  methods  are 
needed  to  approximate  Pg. 

The  most  accurate  approximation  available  for  Pg  is  based  on 

an  asymptotic  evaluation  of  the  integral  in  Eq.  (1)  as  the 
distance  0  to  the  0-point  increases  indefinitely.  It  can  be 
shown  that  Pp  can  be  approximated  asymptotically  as  M  ®  by  [1] 

pp  .  =  FT  (l-k.)"*  (3) 

i-,a  i=1  1 

in  which  are  the  principal  curvatures  of  the  0-point  assumed 
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to  satisfy  the  conditions  l>kj£kg£ —  ^kn-l  when  P  =  1*  The  same 

asymptotic  result  can  be  obtained  for  Pp  if  the  limit  state  is 

approximated  in  a  small  neighborhood  of  the  0-point  by  a 
quadratic  tangent  to  the  limit  state  at  Xq.  Equation  3  can  be 

generalized  to  the  case  where  the  limit  state  has  finitely  many 
0-points.  In  this  case  Pp  is  asymptotically  equal  to  a  sum  of 

terms  as  in  Eq.  (3)  corresponding  to  each  0-point. 

Figure  3  shows  the  asymptotic  probabilities  of  failure  in 

Eq.  (3) . ,  Pp  ,  and  the  actual  failure  probability,  P„  for 
r  ,  a  r 

varioug  elliptical  gomains  with  limit  states 

(x^/a)*5  +  (Xg)*1  =  0  .  As  expected,  the  ratio  of  these 

probabilities  approaches  unity  as  0  increases  [1].  It  is 
approximately  one  for  large  values  of  a  because  the  problem 
becomes  one-dimensional,  in  which  case  Pp  a  is  equal  to  Pp. 

An  alternative  method  for  calculating  Pp  can  be  based  on  a 

simulation  approach.  Brute  force  simulation  is  impractical 
because  Pp  is  usually  smaller  than  10  .  However,  an  efficient 

simulation  method  can  be  developed  to  estimate  Pp.  Assume  for 

simplicity  that  the  function  g  specifying  the  limit  state  is  the 
quadratic  form 

Z  =  gQD  =  X  Ta  X  +  b  TX  +  c  (4) 

The  Gaussian  vector  X  can  be  expressed  as  £  =  AR  in  which  A  is  a 
random  vector  uniformly  distributed  on  the  unit  sphere  in  Rn  and 
R  is  a  chi  random  variable  with  n  degrees  of  freedom. 

Conditional  on  A  =  X,  the  quadratic  form  is 

Z|A  =  XT  =  (XT  a  X)  R2  +  (bT  X)  R  +  c  (5) 

The  conditional  probability  of  failure  is 

Pp(X)  =  Pi  Z  >  0  |  A  =  X  ?  (6) 

and  can  be  calculated  by 

Pp(x)  =  Fx  +  1  "  Fx  (r2(X)>  (7) 

n  n 

when,  e.g.,  the  roots  r, (X) ,  k=l,2,  of 
T  2  T  K 

(X  a  X)r  +  (b  X)  r  +  c  =  0  are  positive  and  r^(X)  £  r2^-  ^ * 

An  estimator  of  the  probability  of  failure  Pp  is 
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(8) 


PF  -  jj  X  PPC  £J)] 

r  wj=l  r 

in  which  X^^  are  samples  of  A  and  N  denotes  the  number  of  these 
samples. 

Tables  1  and  2  show  values  of  Pp  obtained  in  two  samples  of 
size  N=100  for  safe  domains  characterized  by  two  quadratic  forms: 

z  =  x2  +  x2  -  x1x2-  Cfy/f0)2  (9) 

and 

z  =  l  (x  +  |1  )2  -  e2  (10) 

i=l 

The  first  form  corresponds  to  the  von  Mises  strength  criterion 
while  the  second  one  corresponds  to  a  non-central  chi-square 

^  2 

variable  with  non-centrality  parameter  n  =  £  n  . .  From  the 

i=l  1 

tables,  Pp  satisfactorily  approximates  Pp  in  both  cases. 

When  the  components  of  X  are  independent  but  do  not  follow 
Gaussian  distributions,  the  techniques  discussed  in  this  section 
can  still  be  applied  if  the  x_space  is  mapped  into  a  new  space 
according  to  the  transformation 

Ui  =  * _1  °  Fi  (Xi)  (11) 

in  which  F.  is  the  distribution  of  X . .  The  variables  U.  are 

X  XX 

independent  and  follow  standard  Gaussian  distributions.  Figure 
4,  from  reference  [1]  shows  exact  values  of  Pp  and  asymptotic 

approximations  of  the  probability  of  a  safe  domain  defined  by  the 

condition  £  X.  -  n  -  ocJn  where  the  variables  X.  are  independent 
i=l  1 

identically  distributed  exponential  random  variables  with  unit 
mean.  The  asymptotic  approximation  is  also  satisfactory  in  this 
case. 

i 

Techniques  are  also  available  to  map  vectors  of  dependent 
non-Gaussian  variables  into  Gaussian  vectors  with  independent 
components.  Following  such  transformations  one  can  directly 
apply  any  of  the  methods  developed  for  Gaussian  vectors. 
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Consider  a  process  X(t)  defined 


X(t)  =  h(t-T)fi(Y(T))dT 

Jo 


(12) 


in  which  Y(t)  is  a  stationary  Gaussian  vector  process  in  Rn  with 
independent  components.  The  process  X(t)  can  be  thought  of  as 
the  response  of  a  linear  system  with  unit  impulse  response 
function  h  to  the  input  g(X(t)).  On  the  other  hand,  if  h  in  Eq. 
(12)  is  &  (the  Dirac  delta  function),  then  X(t)  =  g(Y(t))  is  a 
memory less  transformation  of  the  Gaussian  process  l(t).  As  in 
the  time- invar i ant  case,  the  safety  condition  requires  that  X(t) 
be  smaller  that  an  allowable  threshold  during  a  time  period  r. 

The  reliability  Pg(r)  can  be  approximated  by 

Ps(t)  =  P|X(0)  <;  01  expl-T  v(0) |  (13  ) 


in  which  v(x)  is  the  mean  rate  at  which  X(t)  crosses  a  level  x 
from  below.  Note  that  the  reliability  Pg(r)  depends  only  on  the 

mean  upcrossing  rate  u(x)  and  the  marginal  distribution  of  X(t). 


Three  special  cases  in  which  h  =  6  are  now  considered. 
First,  let  Y(t)  be  a  univariate  Gaussian  process  and  g  be  the 
identity  function-  Then  X(t)  coincides  with  Y(t)  and,  according 
to  the  Rice  formula  [6], 

v(x)  =  [dx/2ncx]exp|-[(x-mx)/ox]  /Z\  ^14 

2  2 
in  which  mx  and  a x  are  the  mean  and  variance  of  X(t)  and  $x  is 

the  variance  of  X(t) . 


Second,  let  Y(t)  be  a  univariate  Gaussian  process  with  zero 
mean  and  unit  variance  and  g  =  Fx  where  Fx  is  any  continuous 

distribution.  The  process  X(t)  in  this  case  is  called  a 
translation  process,  and  the  marginal  distribution  of  X(t)  at  any 
time  t  is  Fx«  If  Y(t)  is  stationary  and  differentiable,  so  is 

X(t).  The  standard  deviations  of  the  derivatives  of  these 

processes  are  related  by  dy  =  qdY/aY,  in  which 

2  — ^  A  A 

=  |E[g'(Y(t))  ]|  .  The  constant  rj  is  generally  close  to  unity 

[3].  The  mean  upcrossing  rate  of  X(t)  can  be  obtained  from  Eq. 
(14)  and  is 
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(15> 


-1  2 

u(x)  =  [cY/27t]expi-[g  (x)]  /2| 

-1  2 

=  [n£x/2nax]expi-[g  (x)]  /2|  . 

In  this  case  v(x)  depends  only  on  Fy  and  dY. 


Consider  any  process  that  can  be  partially  specified  by  its 
marginal  distribution  and  covariance  function.  Such  a  process 
can  be  approximated  by  a  translation  process  having  the  correct 
marginal  distribution  and  covariance  function.  The  mean 
upcrossing  rate  of  this  process  can  be  approximated  by  Eq.  (15). 
In  many  cases  the  translation  approximation  is  conservative, 
meaning  that  it  overestimates  the  actual  value  of  u(x). 

A  third  type  of  memoryless  transformation  of  Y(t)  is 

X(t)  =  Y(t)T  a  Y(t)  +  bT  Y(t)  +  c  (16) 


Two  special  cases  of  this  quadratic  form  are  examined-  Consider 
first  the  time-dependent  form  of  the  von  Mises  criterion  that 
requires  X(t)  =  Y^(t)  +  Yg(t)  -  YjL(t)Y2(t)  be  smaller  that  a 

limit  value  x,  where  Y(t)  =  (Y^(t) , Y2(t))T  is  a  bivariate 

Gaussian  process  with  independent  components  having  mean  zero  and 
unit  variance.  The  mean  upcrossing  rate  of  X(t)  is  [2] 

i>(x)  =  (2x/3) 1//2  e  2x  F  (2  +  cos  u)^  exp|x  cos  uf  du 

J0  (17) 


From  the  density  of  X(t)  a  translation  approximation  may  be 
obtained.  Table  3  compares  mean  upcrossing  rates  v,p  for  the 


translation  approximation  with  the  exact  mean  upcrossing  rate  in 
Eq.  (17).  The  translation  approximation  is  seen  to  be 
conservative  for  moderate  to  large  thresholds.  As  another 

JJ  2 

example,  let  X(t)  =  £  Y.(t)  be  a  chi-square  process  with  n 

j=l  J 


degrees  of  freedom  in  which  Y(t)  =  (Y1(t) , . . . , Y  (t))  is  a 

Gaussian  vector  whose  components  are  independent  identical 
univariate  Gaussian  processes  with  zero  mean  and  unit  variance. 
The  mean  upcrossing  rate  is 


u(x)  =  dx  [x/2nn]^  f(x) 


(18) 


where  f(x)  =  (x/2)n^2  1e  x^2/2r(n/2).  Table  4  gives  ratios  of  v 
to  the  mean  upcrossing  rate  based  on  the  translation 
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approximation-  The  translation  approximation  is  conservative  for 
large  thresholds, and  this  observation  is  correct  asymptotically 
as  x-+o  [2] . 


In  addition  to  the  translation  approximation,  one  can 
develop  approximations  analogous  to  the  asymptotic  approximations 
given  in  section  II.  These  will  not  be  pursued  here,  and  the 
interested  reader  is  refered  to  the  relevant  literature  [1,5]. 


Now  consider  the  more  general  case  in  Eq.  (12)  in  which  h  is 
a  function  which  vanishes  for  t<0.  Since  g  is  nonlinear,  g(Y(t)) 
is  non-Gaussian  and  so  is  X(t).  Direct  determination  of  the 
probability  law  of  X(t)  from  Eq.  (12)  is  usually  impractical.  An 
approximation  is  preferred  to  estimate  mean  crossing  rates  of 
X(t).  The  approximation  can  be  based  on  the  simplified 
representation  m+s(t)Z  of  Y(t)  in  which  Z  is  a  vector  of 
independent  standard  normal  random  variables,  m  is  a  mean  vector, 
and 


s(t)  = 


s1(t) 

0 


0 

s  (t) 
— n  ' 


(19) 


where  s.(t)»  j=l,--.,n  are  vectors  of  sines  and  cosines  with 

appropriate  coefficients.  In  the  special  case  in  which  g(y) 
the  process  X(t)  is  the  following  quadratic  form 


X(t)  =  Z  Ta(t)  Z  +  bT(t)  Z  +  c (t ) 


(20) 


In  contrast  to  the  form  in  Eq.  (16),  the  coefficients  of  this 
quadratic  form  depend  on  time  and  they  operate  on  the  random 
vector  Z,  while  in  Eq.  (16)  the  coefficients  sire  constant  and 
operate  on  the  random  vector  process  J(t).  One  can  derive  the 

characteristic  functions  of  X(t)  and  (X(t),X(t)),  which  can  be 
used  to  find  the  marginal  density  and  mean  upcrossing  rate  of 
X(t)  [2]. 

The  representation  in  Eq.  (20)  can  be  applied  to  estimate 
the  response  of  a  structure  to  wind  loads  that  are  proportional 
to  the  square  of  the  wind  speed  Y(t).  It  is  assumed  that  the 
structure  is  modeled  by  a  simple  oscillator  with  response 

2  X 

function  h(s)  =  expl-<UQs|sin(«cjS)/ocj,  s;>0,  where  =  Uq[1-$  J 

As  a  numerical  example,  Y(t)  is  taken  to  have  mean  6.57  and 
discrete  -urn  given  in  table  5.  The  system  parameters  are 

<  =  O.Or  6.28.  The  marginal  density  of  the  response  X(t) 
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is  illustrated  in  figure  5.  Mean  upcrossing  rates  are  found  by 
the  translation  approximation  and  by  the  joint  characteristic 
function  method.  A  comparison  of  these  in  in  table  6.  As  in  the 
other  cases  considered,  it  is  seen  that  the  translation 
approximation  is  conservative  with  respect  to  the  exact  mean 
upcrossing  rate. 

IV .CONCLUSIONS.  Approximate  methods  have  been  examined  for 
the  reliability  analysis  of  time-independent  and  time-dependent 
problems.  Probability  bounds  and  asymptotic  approximations  have 
been  developed  for  the  estimation  of  the  reliability  of  time- 
invariant  structural  problems.  On  the  other  hand,  the 
reliability  estimates  for  the  time-dependent  problems  have  been 
based  on  mean  crossing  rates  out  of  a  domain  of  safety  and  on 
translation  approximations  of  these  crossing  rates.  The 
translation  approximations  have  been  found  to  be  conservative  in 
the  cases  examined. 
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Table  1.  Exact  and  Estimated  Values  of  Pp  in  Eq.  (9) 


f  /frt 
y  0 

PF 

a 

PF 

Sample  1 

Sample  2 

1 

5.75  10_1 

5.81  x  10_1 

5.68  x  10-1 

2 

i 

o 

H 

X 

tv 

CO 

• 

rH 

1.41  x  10'1 

1.34  x  10_1 

3 

1.83  x  10  2 

1.85  x  10-2 

1.77  x  10~2 

4 

1.37  x  10-3 

134  x  10-3 

1.30  x  10"3 

Table  2  Exact  and  Estimated  Values  of  Pp  in  Eq.  (10) 


PF 

a 

PF 

Sample  1 

Sample  2 

0.0 

1.61  x  10“5 

lO 

o 

X 

CD 

• 

1.61  x  10~b 

0.4 

3.57  x  10"3 

7.02  x  10~3 

6.84  x  10-3 

0.8 

2.19  x  10_1 

2.92  x  10-1 

2.64  x  10-1 

1.0 

5.80  x  10-1 

6.21  x  10 1 

5.89  x  10"1 

Table  3.  Exact  and  Approximate  Mean  Crossing  Rates 
for  the  von  Mises  Criterion 


X 

i> 

v/vT 

1 

2.49  x  10_1 

1.25 

4 

1.07  x  10_1 

0.96 

9 

1.98  x  10_2 

0.87 

16 

1.90  x  10-3 

0.83 

25 

9.42  x  10“5 

0.81 
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Table  4.  Exact  and  Approximate  Values  of  Mean  Crossing  Rates 

for  the  Chi  Square  Process 


1  n  ■  1  1 

n  =  2 

n  =  i 

3 

X 

V 

B09 

V 

EBSai 

V 

v/i>T 

0 

9.85xl0_2 

1-08 

7. 34xl0— 2 

1.05 

4.87xl0~2 

1.03 

3 

1.16xl0-2 

0.87 

7.31X10-3 

0.88 

3. 56xl0~3 

0.89 

6 

1.39xl0-3 

0.84 

4.81xl0“4 

0.84 

8.49xl0~5 

0.84 

9 

1.66xl0“4 

CO 

00 

• 

o 

2. 86xl0~4 

0.81 

1 . 46xl0-6 

0.83 

Table  5.  Power  Spectral  Density  of  Wind  Speed 


“i 

■iBifl 

1.75 

2.25 

3.25 

6.25 

7.75 

9.25 

1.26 

0.34 

0.20 

0.13 

0.11 

0.14 

0.09 

0.07 

0.05 

0.05 

Table  6.  Exact  arvd  Approximate  Values  of  Mean  Crossing  Rates 
for  a  Structure  Subject  to  Wind  Load 


X 

u 

v/v 

0 

6.15  x  10"1 

1.00 

2 

1.12  x  10_1 

0.81 

4 

7.47  x  10-3 

0.74 

6 

3.18  x  10~4 

0.70 

8 

1.02  x  10"5 

0.65 
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Figure  3.  Ratios  of  Asymptotic  to  Exact  Values 
of  the  Probability  of  Failure  for 
Elliptical  Safe  Domains. 
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Figure  4.  Exact  and  Asymptotic  Values  of  the 

Probability  of  Failure  for  Exponential 
Random  Variables. 
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Figure  5.  Marginal  Density  of  the  Response  of  a 
Linear  System  to  Wind  Load. 
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THE  THEORY  OR  RANDOM  WAVE  OPERATORS 


Marc  A.  Bergen 
Carnegie  Mellon  University 
Pittsburg,  Pennsylvania 

5 1 .  Introduction  and  Overview:  An  Operational  I  to  Calculus 

In  order  to  describe  the  method  of  random  characteristics 
we  begin  by  considering  cases  which  lead  to  the  classical 
second  order  Ito-theory.  Let  o :  TRm  -*•  3Rm  be  of  class 
C^,  and  let  X  =  XQ  denote  the  vector  field  X  :=  o  •  V.  As 
usual  exp  denotes  the  mapping  f(x)  »>  f(£(l;x))  where 
£(t;x)  satisfies  d£/di  =  o(£)»  S(0)  =  x.  The  Banach  space 
on  which  exp(X)  acts  is  C(2R™)  with  the  supremum  norm,  3R* 
being  the  one  point  compactification  of  JRm  .  It  is  clear 
that  exp(tX)  ,  t  e  IR  ,  is  the  group  generated  by  X.  Observe 
that  we  are  allowing  time,  t,  to  run  backward  and  forward. 

Let  0 ( t )  be  standard  one-dimensional  Brownian  motion. 

The  mgf  of  0(t)  is  given  by 

Eex6(t)  =  e?  X  t;  x  €  3R,  t  >  0.  (1) 

We  can  ask  about  the  validity  of  (1)  if  x  were  to  be  replaced 
by  X  above.  Let  us  examine  what  each  side  of  (1)  would  then 
mean.  On  the  left-hand  side  e  v  '  would  represent  the  random 
wave  operator  f(x)  h  f  ( £  { 6  ( t)  ;  x) )  ,  and  so  EeXe^  would  be 
the  operator 

f(x)  b  /m  f  (Uz;x)  )p(z,t)dz,  (2) 

where  p  is  the  Gauss  kernel  p(z,t)  *  l//57t  exp(-z  /2t)  .  On 

the  right-hand  side  we  would  have  the  semi -group  operator 
12  12 

exp(^  tX  )  ,  generated  by  j  X  .  In  fact  with  this  interpre¬ 
tation  (1)  does  remain  valid  when  x  is  replaced  with  X. 
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The  easiest  way  to  see  this  is  to  use  the  fact  that  p  is  the 

12  2 

fundamental  solution  of  3u/dt  =  ^  9  u/3z  .  Thus,  upon  inte¬ 
grating  by  parts,  one  sees  that 

u(x,t)  =  Jm  (ezXf)  (x)p(z,t)dz  (3) 

1  2 

satisfies  the  evolution  equation  du/dt  =  ^  x  u»  u(0)  =  f. 

It  is  interesting  to  interpret  this  representation  (2) 
for  the  semi-group  geometrically,  in  terms  of  the  charac¬ 
teristics  K.  The  domain  of  influence  of  a  region  D  c  TRm  , 
at  any  time  t  >  0,  is  that  region  spanned  by  the  charac¬ 
teristic  curves  which  pass  through  D  at  time  zero.  (Recall 
that  "time"  along  the  characteristics  runs  both  backward 
and  forward.) 

Next  let  ok:  lRm  1Rm  ,  1  s  k  s  £,  be  I  such  maps  of 

class  c}  and  correspondingly  let  X.  *  X  =  o,  •  V.  If 

k  * 

6(t)  =  (0^(t) , . . . ,0^(t) )  is  standard  J-dimensional  Brownian 
motion  then 

3E  exp ( "'X,  6(t)>)  =  exp(^  t  Ix^)  ;  x  c  3R*  ,  t  >  0.  (4) 

Wow  we  can  ask  about  replacing  x  =  (x^,...,x^)  with  X  = 

(X^ , . .  .  , Xj, )  in  (4).  At  this  point  we  already  know  what 

interpretation  both  sides  of  (4)  would  have.  Here  the  random 

wave  operator  under  consideration  would  be  exp ( <X, 6 (t) >) , 

and  the  semi-group  under  consideration  would  be  the  one 
1  2 

generated  by  ^  IX^.  This  time  the  answer  is  YES  (the  replace 
ment  x  ■*-  X  in  (4)  is  valid)  if  the  X^  commute,  but  NO  in 
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general. 


SS^S  l!  X1  “  x2  3^  ■  X2  -  3^ 

E  exp(<X,  6 (t) >)  : 

f  ( x)  k  I  2  f<xi  +  zix2+I  ziz2  •  x2  +  z2)p(zi»t)  p(z2»  t)  dz 
JR 

exp t(X^  +  Xj) )  : 

f(x)  k  e  f (Xj^  +  x2e1(t)  + e2(s)de1(s) ,  x2  +  e2(t)). 

What  is  in  fact  true,  however,  is  that  for  small  t,. 

1  2 

E  exp( <X, 0 (t) >)  is  very  close  to  exp(j  t  IX^) ,  in  the  sense 
that 

n  E  exp ( <X , 6 (du) >)  =  exp(i  t  IX?) .  (5) 

0  *  K 

If  T(t)  denotes  the  operator  E exp( <X, 0 (t) >)  then  the  product 
integral  here  indicates  a  Riemann-type  strong  limit 

t 

H  T(du)  =  st  -  lim  n  T(A.  )  ,  (6) 

0  n  i  in 

where  0  =  t~  <  t.  <...<t  „  *  t  forms  a  partition  of 

On  In  v_n  c 

n 

[0 , t ) ,  A.  *  t.  -  t.  .  and  lim  max  A.  =0.  We  shall  see 
in  in  l-m  _  .  in 

n  l 

that  (6)  follows  from  a  version  of  Chernoff's  product 
formula  [8]  which  allows  for  variable  step. size.  (See  Pierre 
and  Rihani  [18].) 

Actually  for  the  case  at  hand  where  6  is  Brownian  motion, 
one  can  interchange  the  expectation  and  product  integral 
in  (5)  , 
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9 


(7) 


t  t 

II  E  exp( <X,  0  (au)  >)  =  E  H  exp(<X,d6 (u) >) 

0  0 


thereby  unveiling  sample  path  behavior.  At  this  point  to 
arrive  at  (7)  we  are  exploiting  the  stationary  independent 
increment  property  of  Brownian  motion.  The  product  integral 
on  the  right  in  (7)  is  now  a  sample  path  limit,  indicated  by 
t 

II  exp(<X,d0  (u)  >)  =  st  -  lim  II  exp(<X,A.  0>)  ,  (8) 

0  n  i  in 


where  the  partitions  0  =  t_  <  t.  <...<t  =  t  are  as 

On  in  v  n 

n 

above,  and  A.  6  =  0{t.  )  -  0(t.  .  ).  This  resembles  McKean's 
m  in  l-in 

injection  set-up  for  Lie  groups  [13,  §4.8].  In  fact  if 
the  vector  fields  X^,  1  s  k  £  l,  belong  to  a  finite  dimen¬ 
sional  Lie  algebra  (e.g.,  o^fx)  = 
then  this  fits  precisely  into  McKean's  set-up.  The  st-lim 
in  (8)  indicates  a  strong  limit  in  the  bounded  linear  opera¬ 
tor  sense,  but  we  also  need  to  specify  the  probabilistic  mode 
of  convergence.  In  what  sense  is  the  sequence  of  random 
variables 


t 

II  n  exp(<X,A.  0>)f  -  II  exp(<X,d0  (u)  >)  f  || 
i  in  0 

converging  to  zero,  for  each  f  c  C(]R™)? 

There  is  a  nice  concise  way  of  describing  the  operator 
II  exp(<X, A^n0>)  ,  appearing  on  the  right-hand  side  of  (8). 

Let  ^ denote  the  piecewise-constant  function  (t)  = 

Ain0/^inf  ^i-in  5  T  s  fcin'  an<*  *et  (t;x)  denote  the  solu¬ 

tion  of  d£(n)/dx  =  S^n)  (t-T)ok(C(n)),  5(n)(0)  =  x.  By  a 
simple  time  scale  one  sees  that 
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(9) 


n  exp(<X,A.  0>)  :  f(x)  I-  f  (  £ (n)  (t ;  x)  )  . 
i  in 

Observe  that  applying  the  operators  in  the  order 

.  exp(<X,A^  n0>)  . . .exp(<X, A2n6>)exp(<X,Aln0>) 
n 

(application  proceeds  from  right  to  left)  leads  to  the  time 
reversed  evolution 


f(x)  K  f(51(Aln;52^2n;  ’*  *  (Av  n;x) }  *  ■  * >  > > }  ' 

n  n 

starting  from  £v  and  working  back  to  £^,  where  £.  (t?x) 

n 

denotes  the  solution  of  dE,./di  =  IA.  0, /A.  •  o,  (£)  , 

*i'  in  k  in  k  * 

£^(0)  =  x.  The  time  reversal  here  can  be  straightened  out, 

though,  by  reversing  time  in  the  Brownian  motion  instead. 

Thus,  if  0  ( x)  =  0  ( t)  -  0(t-x)  then  ^(n)  (t-i)  =  *'in®/Sin' 

t.  .  <  t  £  t.  ,  ana  A  refers  to  the  reversed  partition 

i—xn  in 

t.  =  t  -  t  .  .  Under  the  additional  assumption  that  the 
in  v  -in 

n 

first  partial  derivatives  of  o  are  uniformly  Lipschitz  con¬ 
tinuous,  Stroock  and  Varadhan  [21]  have  shown  that  for  the 
sequence  of  dyadic  partitions  =  [2nt]  +  1,  t^n  =  i/2n, 

0  s  i  s  [2nt] ,  tv  =  t  one  has  the  convergence  in  distri- 

/  i  XI 

bution  £  n  =>  £  where  £  is  the  solution  of  the  Stratonovich 
stochastic  differential  equation 


d£  =  Zok(£)  o  d0k,  £ (0)  =  x. 


(10) 


(This  is  one  of  the  advantages  of  using  the  vector  field  form 

1  IX? ,  rather  than  the  form  Ila. . 32/3x.3x.  +  lb.  3/Dx. , 

2  k'  ID  i  D  i  i 

for  the  generator.  Namely,  the  Ito  stochastic  differential 
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equation  for  its  underlying  diffusion  corresponds  simply  to 
the  above  Stratonovich  equation.  On  the  other  hand  the 
vector  field  form  is  restrictive  —  not  every  generator  has 
this  form,  even  if  we  allow  an  additional  first  order  term 
XQ  as  done  below.)  One  should  identify  appearing  in 

(9)  as  the  solution  of  the  equation  obtained  from  (10)  by 
replacing  9  with  its  piecewise  linear  interpolant  passing 
through  the  interpolation  points  (t^n,§ (t^n) )  ,  0  s  i  s  v  . 

(Cf.  Wong  and  Zakai  [23].)  Thus  we  discover  the  form  that 
our  limiting  operator  in  (8)  ought  to  have: 
t 

n  exp(<X,d0(u) >) :  f(x)  K  f(£(t;x)),  (11) 

0 

where  £  is  the  solution  of  (10)  .  It  is  this  type  of  operator 

we  refer  to  as  a  "random  wave  operator."  When  considered 

t 

in  terms  of  two  parameters  II,  these  operators  form  a  random 

s 

two  parameter  semi-group  (with  stationary  independent 
increments,  much  like  Brownian  motion  on  a  Lie  group)  . 

The  representation  (11)  ought  properly  to  be  understood 
as  "the  fundamental  theorem  of  calculus"  for  the  product 
integrals  of  this  form.  It  shows  that  the  product  integral 
obeys  a  certain  differential  equation  (namely  (10) ) ,  when 
considered  as  a  function  of  its  upper  limit,  and  thus 
obviates  the  necessity  of  resort  to  partitions  0  =  tQn  < 
tin  <  ...  <  tv  n  =  t  for  its  evaluation.  In  fact,  it  relates 
the  calculus  of  these  product  integrals  to  the  technically 
rich  and  easily  mastered  second  order,  or  Ito  calculus. 
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The  product  formula  (5)  has  implications  concerning 

12  £, 

the  support  properties  of  exp(^-  tlX^)  .  For  z  e  1  let  us 

denote  by  £(t;x,z)  the  solution  of  d£/dx  =  Iz^a^d)  , 

£(0)  =  x.  Observe  then  that  T(A.  )  from  (6)  can  be  expressed 

m 

as 


T(ain> 


f(x)  I-*-  /  f (£(l;x,z) )p(z,A.  )dz, 
IrX,  in 


(12) 


where  we  have  introduced  the  notation  p(z,t)  =  np(z^,t). 

Thus  in  order  that  x  c  supp  T(A^n)f  there  must  exist  a  set 
£ 

C  c  jr  of  positive  Lebesgue  measure,  for  which  £(l;x,z)  e 

supp  f,  V  z  e  C.  We  refer  to  C  as  a  control  set,  through 

which  £  can  be  controlled  to  go  from  x  (at  time  zero)  into 

supp  f  (at  time  one) .  By  a  simple  time  scale  we  see  that 

£(l;x,z)  =  £(A.  :x,  z/A .  ),  and  we  can  use  C/A.  to  control 
in  m  in 

£  so  as  to  enter  supp  f  at  time  A^n  (rather  than  time  one) . 
Extending  this  argument  we  can  represent 


n  T (A  .  )  : 
i  1X1 


f(x)  H-  /  f(£(t ;x,£ 

Ik  x.  .  .xIR*  v  n 

n 


( V  )  n. 

z  n  z  ^  ( i) 

'Ain)dz 

in  i 


(13) 

(i) 


(1)  '  n'  Z 

where  £(t;x,zv  ,...,z  )  =  £(t;x,\j>  )  denotes  the  solution 

of  a£/di  =  E^^(x)ok(C),  S(0)  =  x,  and  ^f(x)  =  z(i),  ti-ln  s 
t  <  t.  .  Here  we  use  the  notation  Z  for  (z ^ , . . . , z ^ ^ ) . 

All  /  \ 

(1)  Vn  l 

Thus  for  each  choice  z  ,...,z  c  3R  we  associate  the 

*  2  ( i ) 

piecewise  constant  function  \p  which  takes  the  value  z 

over  the  interval  ^i-ln'^in  )  obtained  from  the  given 


165 


partition.  (Recall  that  t^  *  t  -  tu  _*n.)  The  controls, 


vn-m 


then,  here  are  functions  [0,t)  -*•  IR  ^  which  are  piecewise 

constant  over  the  fixed  partition  intervals  [t.  .  ,t.  ). 

i-ln  in 

Each  such  function  can  be  uniquely  associated  with  some 

£  i  Z 

Z  e  IR  x  . . .  x  ir  ,  and  we  use  the  notation  to  denote  this 

correspondence.  The  action  of  the  control  ip  is  given  by  the 

differential  equation  d£/di  =  (t ) ( £) .  It  follows  from 

(13)  then  that  x  e  supp  n  T(A.  )f  only  if  there  exists  a  set 

i  in 

o  £ 

C  c  3R  x  . . .  x  ir  of  positive  Lebesgue  measure,  for  which 

z  z 

£(t;x,ij>  )  £  supp  f,  V  Z  £  C.  That  is,  we  need  to  use  y 

to  control  the  process  £  so  as  to  go  from  x  (at  time  zero) 

into  supp  f  (at  time  t) .  Furthermore,  if  f  >  0  then  this 

control  criterion  is  also  a  sufficient  condition  for 

x  £  supp  n  T(Ain)f. 

Letting  n  -+•  in  (6)  effectively  picks  up  all  piecewise 

constant,  or  step  functions  \p.  (Especially  if  our  techniques 

allow  us  to  use  arbitrary  partitions.  Otherwise  we  may  be 

restricted,  say,  to  dyadic  step  functions  —  with  dyadic 

step  intervals.)  The  product  formula  (7)  then  implies  that 
1  2 

x  £  supp  exp(^-  t  IX^Jf  only  if  there  exist  step  function  con¬ 
trols  tji,  through  which  £  can  be  controlled  so  as  to  go  from 
x  (at  time  zero)  into  supp  f  (at  time  t) .  It  is  clear  from 
the  equation  d£/dx  =  Ivk(i)ok (£)  that  this  property  is  in 
fact  independent  of  t  >  0.  We  see  from  a  time  scale  that 
if  £  can  be  controlled  to  get  to  supp  f  at  time  t,  then  it 
can  be  controlled  to  get  to  supp  f  at  any  other  time  t'. 
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However,  we  leave  the  "time  t"  requirement  in  the  above 
support-control  condition,  since  it  will  be  important  when 
we  add  on  a  first  order  vector  field  Xq,  and  describe 
supp  exp(t(^  £X£  +  Xq) ) .  There  the  support-control  condi¬ 
tion  will  not  be  independent  of  t. 

Using  the  representation  (10)  ,  (11)  we  see  that  this 
support-control  condition  above  in  fact  amounts  precisely 
to  the  Stroock-Varadhan  characterization  for  the  support  of 
a  diffusion  [21].  Their  result  establishes  that  the  support 
of  the  diffusion  £  satisfying  (10)  is  precisely  the.  closure 

of  the  set  of  n  e  C  ( [0 ,  °°)  ,  3Rm  )  for  which  there  exists  a  step 

0 

function  ij>:[0,°°)  -*■  IR  such  that  dn/dt  =  (t) (n) » 

n(0)  =  x. 

E*lmPAStii.;  Xi  =  2x2  ^  ,  X2  =  x3  . 

The  propagation  is  confined  to  hyperbolic  cylinders.  If 

supp  f  lies  in  one  or  both  parts  of  the  wedge 
2  2 

a  s  x^  -  2x2  s  b  on  one  side  (above/below)  of  the  x^x2- 

12  2 

plane,  then  so  does  supp  exp(j  t(X^  +  X2))f. 

Our  next  interest  is  in  extending  (7)  to  a  product 
integral  representation  for  the  general  linear  evolution 
equation 

=  [|  Zx£(t)  +  XQ(t)]u  +  a  (x,t)u  +  b(x,t)  ,  (14) 


where  the  vector  fields  X^(t) 


o,  •  V  now  come  from 
k 


Vtf 

■  in  in 

time  dependent  mappings  o^(x,t)  from  3R  *  (0,°°)  3R  , 
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0  s  k  s  £..  To  begin  with  we  need  to  identify  exp(X  +  a(x)), 
where  X  *  a  •  V  is  a  vector  field,  and  a:  IRm  -+•  ir  is  a 
bounded  measurable  mapping.  Since  we  want  exp(t(X  +  a(x))) 
to  be  the  group  generated  by  X  +  a(x)  we  define 

exp  (X  +  a  (x) )  :  f  (x)  H-  f  (5  (l;x)  )exp  (/Ja  (5  (t  ;x)  )  dt )  (15) 


where,  as  above,  £(t;x)  is  the  solution  of  dC/di  =  o(5), 
5(0)  =  x.  Consider  now  the  product 


n  exp«X(ti.ln),Ain6>  +  (X0(t..ln)  +Mx,ti.ln)]Ain), 


involving  the  Jl-tuple  of  time  dependent  vector  fields  X(t)  = 
(X^ (t) , . . . ,X^ (t) )  along  with  Xg(t)  and  the  mapping  a(x,t). 

A  careful  "keeping  track  of  things"  reveals  that  this  is 
the  random  wave  operator 


f(x)  h-  f(5(n)  (t;x))exp  (/*  a  tn)  ( £  (n)  (t  ;x)  ,  t-T )  di)  , 


where  £  (t;x)  is  the  solution  of  d£^/dT  = 

I^n)(t-T)a^n)(t(n,.t-T)  +  o<n)  <£<">, t-T),  £(n)<0) 

£or  ti-ln  s  T  ‘  ‘in 


=  x  and 


,  ,  a.  e 

^  "  ( T )  =  ln 


A.  ' 

in 

,  (n) 


°kn)(,)  *  °kn) (x'ti-ln>' 


(x,t)  =  )  • 


By  reversing  time  it  is  easy  to  see  that  (x;x)  *  (t-x;x) 

where  5^  satisfies  d^^/clT  =  (  t )  ( 5  f ,  t)  - 

( 5^n)  ,  t)  ,  5^n^(t)  -  x.  Thus  one  expects  the  product 
integral  H  exp( <X(u)  ,d(i (u)  >  +  (Xq(u)  +  a(x,u)]du)  to  be  the 
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random  wave  operator 


f  (x)  -*■  f  (Us;x)  )exp(/ga(C  (t;x)  #T)dT)  , 

where  £(t;x)  satisfies  the  backward  Stratonovich  differential 
equation 

d l  =  -Zak(C,T)  °dek(T)  -  o0(S,T)dT,  l(t)  =  x.  (16) 

Thus,  using  the  technique  of  variation  of  parameters  for  pro¬ 
duct  integrals  (Dollard  and  Friedman  [9]),  we  see  that  the 
solution  of  (14)  with  initial  condition  u(x,0)  =  f (*)  is 
given  by 

=  f  (£(s;x)  )exp(/^a(£ (x;x)  ,T)dx) 


u(x,t) 


+ /gb(!(r;x) ,r)exp(/^a(C(T;x) ,T)dx)dr. 


(17) 


§2 .  Extension  to  Higher  Order  Equations 

In  order  to  extend  these  ideas  to  higher  order  equations 
we  need  to  introduce  the  fundamental  solution  pn(z,t)  for 
the  equation  9u/9t  *  (-1)  ^ n ^/n!  3nu/9zn,  n  even.  This 
function  has  the  scale  property  pn(z,t)  =  t”ly/npn  (t-1//nz ,  1)  , 
and  pn(z,l)  is  given  by 

Pn(z,l)  «  ^  /“  cos  Xz  exp  (-  £y)dX.  (1) 


Associated  with  pn  is  a  generalized  Brownian  motion  0  ^ 
with  transition  densities  pn(x,t)  and  infinitesimal- genera¬ 
tor  (-1) ^/n!  9n/9xn.  This  process  0 j  has  been 
studied  by  several  people  ([10], [12],  [14],  [16]),  and  is 
not  a  genuine  diffusion  since  its  transition  densities 
p^(x,t)  are  signed.  In  fact  it  does  not  even  arise  from  a 
signed  probability  on  path  space,  since  such  a  measure  would 
necessarily  be  of  infinite  total  variation.  Nevertheless 
if  we  are  willing  to  work  in  a  finitely  additive  setting 


it  can  be  shown  that  0 


(n) 


generates  an  n^  order  analogue  to 


Ito's  stochastic  calculus.  In  particular  if  0^nj(t)  = 

(0.  . ,(t),...,0,  .  ,  (t) )  is  i-dimensional  generalized  Brownian 
(n) 1  (n)  l 

motion  then 


n 


3Eexp(<x,  6  (t)  >)  =  exp( 


(-1) 


-  1 


n! 


t  IxjJ)  ; 


x  e  nr,  t  >  o. 


(2) 


This  parallels  (1.4)  and  we  can  again  ask  about  replacing 
x  =  (x^,...,x^)  with  X  =  (X^,...,X^)  for  vector  fields 
Xk  =  °k  *  V'  Can  we  expect 
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(3) 


n 


2  “  1 
exp((-l)  /n! 


tzxjj) 


II  IE  exp  (<X,  9  ,  .  (du)  >) 
0 


=  IE 


t 

II  exp  (<X,d0  .  v 
0 


(u) >)  , 


t 

where  II  exp(<X,d0,  .  (u)>)  is  to  be  interpreted  as  a  random 
0  (n) 

wave  operator  f(x)  h-  f(£(t;x))  and  £  is  the  solution  of  the 
generalized  Stratonovich  differential  equation 


d£  =  I okU)  o  d6(n)k,  £(0)  =  x?  (4) 

The  use  of  the  name  Stratonovich  here  simply  indicates  that 
the  generator  of  £  is  to  have  the  invariant  form 
(_D  ^n/2)  1/ ni  ex”.  (Cf.  (20,  §4].)  The  0  again  indicates 
a  time  reversal  0(t)  =  0(t)  -  0(t-t),  0  £  t  <  t.  The  left 
equality  in  (3)  can  be  understood  without  the  need  of  setting 
up  a  generalized  stochastic  calculus.  The  operator  T(t)  = 

3E  exp (<X, 0 (t) >)  is  given  by 

f(x)  H-  /  f(Ul;x,z))p  (z,t)dz, 

BT  n 

where  £  satisfies  d£/dT  =  Zzkok(£),  £(0)  =  x  and  pn(z,t)  = 

lip  (z,  ,t).  Thus  the  analogue  of  (1.13)  holds  here,  with  p 

replaced  by  pn<  In  particular  establishment  of  the  first 

part  of  (3)  would  lead  to  the  analogous  support  property  for 
(-1)  (ri/2 )  -1/n j  £Xj\ 

The  right  equality  in  (3)  comes  in  only  as  a  handy  cal- 

culational  tool  for  the  product  integral.  It  shows  that  the 

t 

calculus  of  these  product  integrals  II  exp (<X,d0  ,  .  (u) >)  is 

0 
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th  a 

essentially  an  n  order  analogue  of  the  Ito  calculus.  This 
is  that  same  "fundamental  theorem  of  calculus"  described 
above. 

In  order  to  allow  for  lower  order  terms  in  the  generator 
we  introduce  the  generalized  Appell  polynomials  (see  Bell 
[1])  IRn  ^  -*■  3R  defined  by 


,n 


*n(y 


1 . yn-l>  =  iTT  exp(IyjtJ)/t= 


O’ 


(5) 


Let  0^nj  be  a  one-dimensional  nfc^  order  Brownian  motion, 
and  set  ejjj(t)  =  d03Jn)  ,  1  s  j  s  n-1.  (See  [10]  >  [16].) 
Then 


IE  exp  (ly j  6  j3  j  (t) 


)  =  exp  ( (-1) 


a  _  i 
2  1 


t^n(yr...,yn.i)) 


(6) 


This  leads  us  then  to  expect  the  following  generalization  of 

(3).  Let  =  CKk  •  V,  1  s  j  s  n-1,  1  s  k  <  £,  and 

XQ  =  a0  •  V  be  smooth  vector  fields.  Let  6  ^  = 

(0,  0 ,  ..)  be  J. -dimensional.  Then 

(n)  1  (n)  x. 


n 


-  1 


exp (t (-1) 


(X 


n^lk,...fxn_lk)  +  x0)) 


-  n  IE  exp  (Z  Z  X.,  O3,..  (du) 
0  j  k  ]K  ln,K 

t 

=  IE  n  exp  (Z  Z  X..d03..  (u) 
0  j  k  ■»*  'n;K 


+  Xgdu) 

+  XQdu) , 


(7) 


and  II  exp(Z  Z  Xjkd03nj  k  (u)  +  XQdu)  is  to  be  interpreted  as  a 
random  wave  operator  f(x)  l*  f(£(tyx)),  where  £(x;x)  is  the 
solution  of  the  generalized  Stratonovich  differential  equation 
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s(0)  =  x.  (3) 


<U(t) 


Z  Z  ajk(0 

3  k  J 


deD 


(n)  k 


(x)  +  aQ(£)dx, 


Again  the  use  of  the  name  Stratonovich  here  simply  indicates 

that  the  generator  of  £  is  to  have  the  invariant  form 

(_D  (n/2)  1  Z4>n  +  XQ.  We  must  explain  here, 

k 

though,  what  is  meant  by  <fc  (X.,..,,X  . )  for  vector  fields 

n  l  n-i 

X.  ,...,X  Our  convention  is  that  all  monomials  are 

i  n-i 

evaluated  as  symmetric  products.  >or  example  if 
a(y1,y2)  =  yxy2  and  b(y1,y2)  =  y^  then 

a(xlfx2)  =  ~  (x1x2  +  x2x1) 

b(X1,X2)  =  j  (XXX2  +  X2XiX2  +  X2X1}* 


The  operator  T ( A)  =  3Eexp(I  Z 


f  ( x )  f  f  ( £  (  A;x,z:  A))p 

IR 


(A)  +  XqA)  is  given  by 
(z , t) dz , 


where  £  satisfies  d£/dx  =  Z  Z  z^/A  o^U)  +  °Q  (£)  ,  £(0)  =  x. 

j  x 

Another  approach  to  allow  for  lower  order  terms,  based 
on  the  Cameron-Martin  formula,  is  that  of  Motoo  [14]  and 
Nishioka  [16].  These  authors  consider  what  amounts  here  to 
product  integrals 


t 

II  exp  (<X,d6  ^  (u)  >  +  Z 


z 

k 


ajk(x)d6(n)k(u) 


+  aQ (x) du) , 


where  X  =  (X^,...,X^)  are  vector  fields  X^  =  •  V,  1  s  k  s  l, 

the  maps  o^:  3Rm  -*■  3Rm  are  c£  ^  and  the  maps  a^:  IRm  ■+■  IR  , 

1  s  j  i  n-1,  1  s  k  <  l,  and  a^:  IRm  ■>  IR  are  C^.  Following 
our  earlier  steps  one  can  see  that  this  product  integral  ought 
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to  be  the  random  wave  operator 


f  (x)  H-  f  (C  (t  ;x) ) 


(9) 

•  exp  {/J  Z  Z  ajk(C(T;x))  °d®3(n)}c^T)  +  aQ (£ (t;x) )dt} 

where  5  is  the  solution  of  the  generalized  Stratonovich  dif¬ 
ferential  equation  (4) ,  and  the  stochastic  integrals  in  (9) 
are  generalized  Stratonovich  integrals.  This  means  that 


A. 


/q a(x)  °dej(T)  =  lim  Z/  in  c*(t)  in 
'  n  ai-ln 


(A.  0) j 


T~ 

m 


dr. 


The  operator 


T  (A) 


is  given  by 


IE  exp  (<X,  6  (A)  > 

+  Z  Z  Ojjc(x)e3(n)k(i)  +  aQ  (x)  A) 


(10) 


f  (X)  H-  /  f  (£(A;x,  f)) 

IR  * 

•  exp{/^[Z  Z  a..  (C(t;x, 
U  j  k 

+  aQU(i;x,  |))]dT}pn 


(z , A) dz , 


(ID 


where  £(t;x,z)  satisfies  d$/dT  -  Zz^a^iE,)  ,  £(0)  =  x.  The 
analogue  to  (7)  is  then 
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n 


2  "  1 

exp(t(-l)  £*ntXk  +  olk<o2k'-"'°n-lk>  +  a0> 


i 

n  IE  exp  (<X/ 0  (du)  >  +  I  I  ujke(n)k  +  aOdu^ 
0  3  k 


t 

=  IE  n  exp (<X,d0  (u)  >  +  E  2ajkde:(n)k  +  aodu*  ' 

0  ^  k 


where  represents  the  operator  which  multiplies  a  function 
by  ctjk(x).  The  rule  here  for  evaluating  <J>n  (X  +  a^,a2» . . .  ,an_^)  • 
where  X  is  a  vector  field  and  a^:  ]Rm  -*■  IR  are  smooth  func¬ 
tions,  is  exactly  as  above  in  (7):  the  monomials  in  <J>n  are 
evaluated  symmetrically.  This  representation  (12)  is 
analogous  to  what  Simon  [19,  §15]  calls  the  Feynman-Kac-Ito 
formula. 

Actually  the  form  of  (9)  appearing  in  Motoo  [14]  and 
Nishioka  [16]  is  the  non-symmetric  form,  involving  generalized 
Ito  rather  than  Stratonovich  integrals.  If  one  replaces  the 
Stratonovich  integrals  in  (9)  with  their  Ito  counterparts, 
then  the  operator  T(A),  representing  the  short  time  average, 
would  have  to  be  given  by 


f  (x)  I-*-  /  .  f  U  (A;x,  j) ) 

nr  a 

•  exp{£  r.  a  . k (x)  +  aQ  (x)  A}pn  (z  ,  A)  dz, 

j  k  J 


(13) 


rather  than  (11) .  The  counterpart  to  the  left  equality  of 
(12)  is 


H_i 

exp  (t  (-1)  ^  j=VXk  +  «lk,a2k . an_lk)  +  oQ)) 

t 

=  n  T (du) 

0 


(14) 
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but  now  d>  (X  +  a,  ,  a- , . . .  ,  a  «)  has  to  be  evaluated  in  its  non 
n  Li  n—i 

symmetric  form,  where  all  monomials  involving  X  are  evaluated 


by  applying  X  first  (i.e.,  on  the  right).  In  this  case  the 
coefficients  need  not  be  smooth,  since  none  of  their 
derivatives  arise  in  the  generator.  The  different  genera¬ 
tors  corresponding  to  (11)  and  (13)  reflect  the  difference 


in  the  stochastic  calculus  stemming  from  the  Stratonovich 
and  I  to  integrals,  respectively.  For  the  special  case 
£(t;x)  =  x  +  6(n)(t)  in  (9)  (i.e.,  =  1,  1  s  k  s  l) ,  which 

is  the  case  studied  by  Motoo  and  Nishioka,  the  conversion  is 
simply  based  on 


/q  a(x+e(i)  o  d6j  (t) 

n-j  (1 

=  £  'S  oenTT  a(k)(x+e(t))d9:,+,t(t) 

u 

for  0  one-dimensional.  The  advantage  of  the  symmetric  form 
(11)  over  the  non-symmetric  form  (13)  is  that  the  former 
arises  as  a  wave  operator  (namely  (10)),  whereas  the  latter 
does  not.  In  general  wave  operators  obtained  through 
stochastic  product  integrals  involve  symmetric,  or 
Stratonovich,  stochastic  integrals  and  stochastic  differen¬ 
tial  equations,  and  correspondingly  the  generators  take 
a  symmetric  form. 

To  prove  the  left  equality  of  (12)  we  use  a  special 
case  of  the  version  of  Chernoff's  product  formula  appearing 
in  Pierre  and  Rihani  [18] . 
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Theorem  I :  Let  A  be  the  generator  of  a  linear  contraction 
Cq  semi-group,  and  let  (T(t)  :  t  >  0}  be  a  family  of  linear 
contractions  satisfying 


lim  T(^f“f  =  Af,  f  e  D(A)  .  (16) 

t+0 


t 


Then  the  strong  product  integral  II  T(du)  exists  and  equals 

0 

exp (tA) . 

We  can  extend  this  Theorem  so  as  to  allow  A  to  be  the 
generator  of  certain  non-contractive  Cq  semi-groups.  The 
operators  T(t)  can  be  bounded  linear  operators  satisfying 
(16)  ,  provided  there  exists  a  constant  in  ^  0  such  that 


|| T(t)  ||  <;  ewt ,  t  >  0.  (17) 

One  simply  replaces  T(t)  with  e  a>tT(t)  and  A  with  A-w,  and 
then  Theorem  I  applies. 
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Characteristic  Functions  of  a  Class  of  Probability  Distributions 


Siegfried  H.  Lehnigk,  Huntsville,  Alabama 
Research  Directorate 

Research,  Development,  and  Engineering  Center 
U.  S.  Army  nissile  Command 
Redstone  Arsenal,  Alabama  35393-5240 


Summary.  The  characteristic  function  of  a  class  of  continuous  one-sided 
probability  distributions  is  being  considered.  The  distribution  class 
contains  three  independent  parameters;  one  of  them  represents  sc3le,  trie 
other  two  determine  initial  and  terminal  shape  of  the  associated 
probability  density  function.  The  analytical  properties  of  the 
characteristic  function  depend  heavily  on  the  terminal  shape  parameter  X 
which  may  vary  in  the  interval  (-  «,  1).  If  0  <  X  <  1.  the  characteristic 
function  is  many-valued  with  branch  points  at  zero  and  infinity,  its 
principal  branch  is  holomorphic  and  bounded  upon  analytic  continuation 
(into  the  complex  plane  cut  along  the  nonnegative  real  axis)  from  the 
primary  element  which  is  holomorphic  in  the  open  left-hand  plane, 
ir  X  =  0,  the  primary  element  of  the  characteristic  function  is 
nolomorphic  in  the  half-plane  left  of  the  vertical  line  through  the  point 
(b"\  0),  b  being  the  scale  parameter.  Upon  continuation  it  becomes 
either  a  rational  function  (if  the  initial  shape  parameter  is  a  nonpositive 
integer)  with  a  pole  at  the  point  (b_1,0)  or  a  many-valued  function  with 
branch  points  at  (b_  1 ,0)  and  infinity  whose  principal  branch  is 
holomorphic  in  the  plane  cut  along  the  real  axis  from  b"1  to  infinity.  If 
X  <  0,  the  characteristic  function  is  an  entire  function  of  order  greater 
than  unity.  It  has  no  real  zeros  but  an  infinity  of  conjugate  complex 
pairs  of  zeros  even  if  the  order  is  an  even  integer. 


(To  appear  in  Complex  Variable;  Theory  and  Application) 
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Folucn  and  Extreme 


Value  Limit  Theorems  for  Markov  Random  Fields 


Simeon  M.  Berman 

Courant  Institute  of  Mathematical  Sciences 
New  fork  University 
251  Mercer  Street 
New  York,  NY  10012 


ABSQUCX 

Let  tm  be  the  integer  lattice  in  R*,  and  let  X^,  t  tz", 
be  a  Markov  random  field.  Let  ^  be  a  rectangular  box  in  lm 
with  corner  points  having  coordinates  of  the  fora  ♦  a.  De¬ 
fine  M,j  -  maxU^i  “t  €  1^) .  the  extreme  value  limit  problem  la 
aa  follows:  find  conditions  under  which  there  exist*  nonde¬ 
generate  distribution  function  G(x)  and  real  aequeacea  (an)  and 
(bn),  with  au>  0,  such  that  the  conditional  probability 

P(an”,(MB-b|1)  4  x  J  Given  X#t  a  f  boundary  of  1 n) 

convargea,  for  n  -eoo,  to  G(x)  at  all  points  of  continuity,  and 
for  all  possible  values  of  Za  on  the  boundary. 

Here  the  extreme  value  limit  problem  la  solved  for  a  gene¬ 
ral  claaa  of  Markov  random  fields,  fhe  conditions  on  the 
field  are  stated  in  terms  of  the  system  of  nearest  neighbor 
conditional  distributions.  These  distributions  are  assumed  to 
be  invariant  under  translations  in  Z*  (homogeneity).  Dobnisin's 
condition  for  regularity  and  mixing  is  also  assumed  to  hold, 
so  that  there  exists  a  unique  stationary  measure  P. 

In  addition  to  these  general  conditions, the  following  more 
special  conditions  are  also  assumed: 

1.  7or  fixed  t,  the  marginal  distribution  of  Z^  under  the 
stationary  measure  belongs  to  the  domain  of  attraction  of  an 
extreme  value  limiting  distribution  function  G(x)  with  nor¬ 
malising  sequences  (afi)  and  (bQ).  Shis  is  equivalent  to 

co^xt*  "  »<*>• 

2.  Tor  all  possible  values  of  X#  for  points  s  which  are 
neighbors  of  0, 

P(Xq  >  u  |  Z,,  b  i  neighbor  o£  0)  ■  0(P(Xq  >  u)),  for  u  *  oo . 

This  paper  has  been  accepted  for  publication  in  Advances 
in  Applied  Probability. 


’sThls  psper  represents  results  obtained  at  the  Courant  Institute  oT 
Mathematical  Sciences,  New  York  University,  under  the  sponsorship  of 
the  U.S.  Army  Research  Office,  Grant  niaaber  MAG-29-B5-K-01V6. 
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A  BOUND  ON  THE  VARIATION  BETWEEN  TWO  PROBABILITY  MEASURES 


IN  TERMS  OF  THE  INTENSITIES  OF  V  DISCRETE  POINT  PROCESS 
RELATIVE  TO  THESE  PROBABILITIES. 

G.  R.  Andersen 

U.S.  Army  Ballistic  Research  Laboratory  (BRL) 

1.  Introduction:  In  an  application  of  discrete  parameter  point  processes  to  a 
communication  network  problem  at  the  BRL  there  was  a  need  to  measure  the 
robustness  of  a  point  process  intensity.  That  is,  we  wanted  to  know  if  small  changes  in 
the  intensity  of  a  point  process  implied  small  changes  in  the  distribution  of  the  point 
process.  II.  Rost  proved  such  a  result  for  continuous  parameter  point  processes  having 
absolutely  continuous  compensators,  in  the  1984  University  of  Strasbourg  Seminar  in 
Probability.  If  Rost  had  considered  the  case  where  the  compensator  was  absolutely 
continuous  with  respect  to  an  increasing  process  (instead  of  just  Lebesgue  measure),  it 
would  have  been  directly  applicable  to  the  discrete  point  process  model.  Rather  than 
extend  his  result  in  this  direction  here,  it  was  decided  to  see  what  would  be  required  to 
prove  an  analogous  result  totally  within  the  framework  of  discrete  point  processes. 
These  processes  are  sequences  of  Bernoulli  random  variables  (with  no  distributional 
assumptions  or  assumptions  concerning  independence)  and  so  are  of  fundamental 
importance  to  probability  theory. 

To  derive  a  discrete  parameter  analogue  of  Rost’s  result,  we  will  require  a  sequence  of 
four  Lemmas.  These  Lemmas  are  known  from  the  general  theory  (Jacod  [1979],  Itmi 
[1980],  Brcmaud  [1981])  where  they  are  proved  in  the  case  of  continuous  parameter 
marked  point  processes.  Bremaud  also  treats  in  detail  the  case  where  the  point  process 
has  an  absolutely  continuous  compensator.  The  latter  case  does  not  apply  to  discrete 
point  processes,  but  the  form  of  the  statements  and  the  essential  mechanics  of  the 
proofs  for  the  discrete  case  can  be  inferred  from  Brcmaud’s  presentation.  The 
relationship  of  discrete  point  processes  to  the  general  marked  point  process  of  Jacod 
[1975]  is  given  in  Andersen  [1986,  Chapter  4],  The  mathematical  setting  considered  by 
Jacod  is  general  enough  to  allow  one  to  obtain  the  correct  statements  of  the  Lemmas  for 
a  discrete  point  process  by  the  simple  device  of  embedding  such  a  process  (and 
filtration)  in  a  continuous  parameter  process  (and  filtration)  which  is  constant  between 
integer  times. 

In  the  case  of  discrete  point  processes,  however,  it  is  extremely  easy  and  informative  to 
derive  these  Lemmas  directly  from  first  principles  and  this  is  what  we  will  do.  The 
discrete  analogue  of  Rost’s  Theorem  simply  does  not  follow  from  his  result  and  so  it  is 
derived  in  Section  5.  For  a  discussion  on  discrete  point  processes  and  their  use  in 
approximating  continuous  parameter  point  processes  one  can  refer  to  Brown  [1983]. 

Readers  not  already  familiar  with  the  relatively  new  martingale  techniques  might  find 
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that  the  discrete  form  of  stochastic  calculus  provides  easy  access  to  this  area.  These 
techniques  have  wide  applicability  to  engineering,  physics  and  statistics;  for  a  small 
sample  of  applications  to  queuing,  control,  statistics,  reliability  and  design  of 
experiments  see  Bremaud  [1981],  Jacobsen  [1982],  Gill  [1980] 

2.  Notation  and  Preliminaries:  Let  Z+  be  the  set  of  non-negative  integers.  (H,  F^, 
(Fn),  P)  is  called  a  filtered  probability  space  if  FM  is  a  <r-algebra  of  subsets  of  fl  and 
P  is  a  probability  measure  on  F^  with  Fn  a  sub  <r-algebra  of  F^,  for  each  ncZ+  and  the 
sequence  n— ►F,,  is  increasing  (Fn  is  contained  in  Fn+1,  for  all  nfZ+).  X  =  (Xn,  ncZ+Jis 
said  to  be  a  stochastic  process  if  each  Xn  is  a  random  variable  on  (Q,  F^).  Let 
AXk:=Xk -Xk„i  and  define  the  process  X,  by  setting  (XJn  :=  Xn_,  with  X_,:=0  for  all 
neZ+.  As  always,  the  conditional  expectation  of  a  P-integrable  random  variable  Z  given 
the  <r-algebra  Fk  is  written  E(Z  |  Fk).  In  what  follows,  a  constantly  (and  silently)  used 
property  of  conditional  expectation  is  that  if  g  is  a  bounded  Fk-measurable  process,  then 
E(gZ  |  Fk)  =  gE(Z  |  Fk),  a.s.P;  the  abbreviation  “a.s.P”  means  “almost  surely  relative 
to  the  probability  P”.  Its  use  with  the  last  equation  indicates  that  the  random  variables 
defined  on  either  side  of  this  equation  are  only  equal  on  an  event  whose  probability  is 
one. 

Let  X  =  (X„)  and  V  =  (Vn)  be  processes  on  (f^F^).  Then  the  transform  of  X  by  V, 
denoted  V.X  =  (  (V.X)n  ),  is  the  process  defined  by  setting 

(  V.X  )n(w)  :=  £  Vk(w)  AXk(w) , 

k-0 

for  all  w  in  fi.  If  X  is  a  square  integrable  processes  relative  to  P,  then  the  variance 
process  of  X  is  denoted  by  <X,X>  and  is  defined  by 

<X,X>„  :=  £  E((AXk)2  |  Fk_,  ), 


X  =  (Xn)  is  said  to  be  adapted  to  the  filtration  F=(Fn)  iff  Xn  is  Fn  -  measurable  for 
each  n,  while  V  =  (Vn)  is  said  to  be  previsible  relative  to  the  filtration  F  iff  Vn  is 
F^-adapted.  If  X  is  F-adapted,  then  X_  is  F-previsible. 

If  M  is  a  discrete  parameter  process  on  the  filtered  probability  space  (O.F^FJjP),  then 
M  =  (Mn,  Fn)  is  an  (F,P)-martingale  iff 

(i)  M  is  adapted  to  F, 

(ii)  M  has  finite  expectation 

(iii)  E(  Mn  |  Fn_i  )  =  Mn_,  (a.s.P) 
for  all  ncZ+ 
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2.0.1.  Remark.  The  following  examples  are  immediate  consequences  of  the  definitions. 
(1°)  If  M  is  an  (F,P)-martingale  and  V  is  F-prcvisible.  Then  V.M  is  an  (F,P)-mart ingale, 
if  V.M  is  P-integrable. 

(2°)  If  M  is  a  square-integrable  martingale,  then  M2  -  <M,M>  is  a  martingale. 

(3°)  If  X,  V  and  V.X  are  square  integrable  processes,  and  V  is  previsible,  then 
<V.X,V.X>n  =  V2.<X,X>n,  a.s.P.  and  E((V.X)2)  =  IC(V2.<X,X»,  if  X  is  a  mar¬ 
tingale. 


Additional  notation  used  includes  writing  the  “indicator  function”  of  a  set  A  as  lA. 

3.  Discrete  Point  Processes  on  a  Measure  Space:  Let  (Q,F00)  be  a  measurable 
space.  Suppose  that  (Tn,n£Z+)  is  a  strictly  increasing  sequence  of  Z+  .=  Z+  u<  oc} 
valued  random  variables  relative  to  (tyF^).  The  statement  that  the  sequence  is 
“strictly  increasing”  means  that  for  all  nfZ+, 

Tn  <  Tn+1  on  (Tn  <  °°1  ==  {wffl:Tn(w)  <  oo}. 

Thus  defined,  the  sequence  (Tn,n(Z+)  is  called  a  discrete  point  process  (Dpp). 

Given  a  discrete  point  process  (Tn,n«Z+),  it  is  customary  to  introduce  the  process, 
N  =  (Nt,t  >  0),  corresponding  to  (Tn)  by  setting 

Nt  E  1  (Tm< t)  (•) 

m>l 

for  t  >  0.  Nt(w)  counts  the  number  of  times  that  members  of  the  sequence 
(Tm(w),  m>l)  fall  in  the  interval  (0,t). 

In  the  case  treated  here  the  “times”  Tn  take  their  values  in  Z+;  they  are  “integer¬ 
valued”.  It  follows  then  that 

Nt  =  il  ^[Tm<t)  (2) 

m««l 

(This  is  because,  while  finite,  the  Tm  are  strictly  increasing  and  integer- valued  functions, 
so  at  most  [t]  of  them  can  occur  before  time  t.)  Note  that  [t]  represents  the  greatest 
integer  less  than  or  equal  to  t.  There  should  be  no  confusion  between  this  use  of  brack¬ 
ets  and  their  use  in  specifying  sets,  as  in  [Tm<t]:={w:Tm(w)<t}. 


For  each  k£Z+,  set 

xk(w)  :==  E  Vm-klM*  (3) 

m»l 

and  Xo(w)  =  0,  for  all  wefi.  Then  it  is  easy  to  see  that 

N,=  fjxki  t  >  0.  (4) 

k-0 

(Just  insert  the  right  side  of  (3)  for  Xk  in  (4)  and  interchange  order  of  summation.)  It  is 
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sufficient  for  our  purposes  then  to  consider  the  counting  process  in  (4)  as  just  the  sto¬ 
chastic  sequence  N  =  (Nn,n*Z+). 

Starting  with  the  discrete  point  process  (Tn)  we  have  defined  the  sequence  (Xn,nrZ+)  of 
Bernoulli  0,1-valued  random  variables  on  Each  of  the  processes  N  =  (Nn)  and 

X  =  (Xn)  are  equivalent  representations  of  the  discrete  point  process  (Tn).  For  exam¬ 
ple,  if  we  are  given  a  Bernoulli  sequence  X  =  (Xn,ncZ+)  relative  to  (H.F^)  we  can 
define  the  sequence  (Tn)  by  setting  T0  :=  0  and 


Tm  =  inf{kcZ+  :  k  >  Tm  „  Xk  =  1} 

(0) 

for  m  >  1,  when  {  •  • 

'  }  ^  <f>  and  equal  to  oo  otherwise.  The  sequence  N  = 

(N„)  is 

defined  by 

ANn  =  Nn  -  Nn_,  =  Xn 

(7) 

N_j  =  0,  for  ncZ+,  so 

that 

n 

Nn  =  EXk. 

(») 

o 


4.  Discrete  Processes  on  a  Filtered  Probability  Space:  We  begin  with  the  filtered 
probability  space  (f^F^fFJ,?),  a  P-complcte  filtration  (F0  contains  all  P-null  sets)  and 
Foo  =  *(UFn)- 

n>0 

Under  this  set-up  an  F-adapted  {0,1  }-Bcrnoulli  process  on  (fyFpJ  with  X0  =  0  on  H  is 
said  to  be  an  (F,P)-discrete  point  process  (Dpp). 

The  process  N  defined  as  in  (8)  is  then  an  F-adapted  process  also  and  the  sequence 
(Tn,neZ+)  defined  by  (6)  is  a  sequence  of  F-stopping  times.  For  the  reasons  noted  earlier 
all  three  sequences  are  called  (F,P)-  discrete  point  processes. 

For  each  n«Z+,  define  the  stochastic  sequence  X  =  (Xn,ncZ+)  by 

Xn  :  =  E(Xn  |  Fn_i),  (0) 

n>l  and  X0  =  0  on  Q.  Then  the  process  X  is  said  to  be  the  (F,P)-intensity  of  the 
underlying  (F,P)  discrete  point  process. 

When  there  is  no  ambiguity  about  which  filtration  or  probability  measure  is  being  used 
we  will  sometimes  drop  one  or  both  of  the  qualifiers  F,  P  and  just  refer  to  the  “inten¬ 
sity”.  On  the  other  hand,  when  we  must  keep  in  mind  that  these  processes  depend  on  F 
and  P  we  will  write,  for  example,  X  =  (Xn,Fn,P)  or  just  X  =  (Xn,Fn).  It  will  be  under¬ 
stood  that  the  index  n  is  in  Z+.  The  following  properties  are  immediate: 

(a)  X  =  (Xn,Fn,P)  is  F-previsible  and  0  <  Xn  <  1  a.s.P. 

n 

(b)  N  =  (Nn,Fn,P)  is  F-adapted  with  compensator  An  =  ^k-  That  is, 

k«=0 
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is  an  (F,P)-martingale. 


(10) 


m„  =  n-a„ 


4.0.1.  Remark:  As  noted  in  the  introduction,  the  following  four  Lemmas  are  known 
from  the  general  theory  (Jacod  [1979],  Bremaud  [1981])  where  they  are  proved  in  the 
case  of  continuous  parameter  marked  point  processes  They  are  proved  here  totally 
within  the  framework  of  discrete  point  processes  for  the  purpose  of  exposition  and  in  the 
belief  that  Bernoulli  variates  are  at  the  heart  of  most  things  probabilistic. 


4.1.  Lemma: 

Suppose  that  N  =  (Nn  Fn)  is  a  Dpp  with  F-intensity  X  =  (Xn,Fn)  Let  fi  —  (/ik,F k)  be  a 
strictly  positive  F-previsible  process  and  1  +  Xk(/ik  “  0-  Then  >  0  for  all 
kcZ+  and 


for  neZ+  defines  a  positive  F-martingale,  L 


n 

n  — 

k-o  A 

=  (Ln,Fn,P). 


(11) 


4.1.1.  Remark:  L0  =  1  on  fi. 


4.1.2.  Remark:  That  ^’k  is  positive  for  all  k  follows  from  (a)  by  treating  the  three  cases 
Xk  =  0,  Xk  =  1,  and  0  <  Xk  <  1.  To  show  that  L  =  (Ln,Fn)  is  a  martingale 
just  realize  that  since  the  X’s  take  values  in  {0,1}, 


/«kk  =  /ikXk  +  (l-Xk). 

Then,  by  the  F-previsibility  of  /:  and  the  definition  of  X 

R(/>kX'  I  f'k-i)  =  i  +  Mck  - 1)  =  Vv 

Since  V’k  is  F-previsible,  it  follows  from  (12)  that 


E( 


/*k 


A 


Ek-i)  —  1> 


Hence  for  n>l,  since  Ln  =  Ln_ 


1  * n 


and  Ln.j  is  Fn_t  -measurable, 


(12) 


E(Ln  |  Fn_j)  =  Ln.,E(^- |  Fn.,)  =  Ln_,. 
That  is,  L  is  an  F-martingale  and  L  is  clearly  positive. 


4.1.3.  Remark:  Notice  that  since  L  is  a  martingale  and  L0  =1  on  fi, 
ELn  =  EL0  =  1  for  all  n>0. 
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Y 

4.1.4.  Remark:  It  is  sometimes  useful  to  write  jik  —  1  4-  IJX^  in  the  definition  of 
Ln.  This  is  about  all  that  is  needed  to  prove  the  following  discrete  analogue  of  a  result 
due  to  C.  Doleans-Dade. 


4.2.  Lemma:  ( ft,  V\  and  X  as  in  Lemma  4.1.  ) 
Set 


8k  =  ( Mk  -1)M. 

(13) 

If 

Xk 

n  /*k 

L»=  n^-. 

k-o  n 

(I'll 

then 

Ln  =  1  +  (gL_.m)n 

(15) 

and  conversely. 

4.2.1.  Remark:  Just  observe  that  from  (13)  and  (14)  with  n>  1, 

kn_|(l+(//n  ~  l)^n  ~  Vn)/^n 

where  m  is  defined  in  (b)  as  m=N  -  A,  so  that  Amn  =  Xn  -  Xn.  Summing  both  sides  of 
the  first  equation  gives 

n 

K  4  =  £Lk-iBkAmk  =  (L  g.m)n, 

k  =  I 

since  Am0  =  X0  -  X0  =  0.  Because  L0  =  1  we  have  (15).  The  converse  follows  by  rev¬ 
ersing  the  argument. 

4.2.2.  Remark:  We  continue  with  X  =  (Xn,Fn,P)  as  the  (F.P)-intensity  of  a  Dpp 
X  =  (Xn,Fn,P). 


4.3.  Lemma: 

Let  L,  fi  and  0  be  defined  as  in  Lemma  4.1.  Define  a  probability  measure  P  on  (fl.Foo) 
by  setting 

P(A)  =  /Lv(w)P(dw),  (16) 

A 

for  all  AcF^.  Then  X  =  (Xn,Fn,P)  is  a  Dpp  with  (¥,}*)-intensity  a,  where 


k=l,8,...,v. 


°k  =  hPk/A,  *-s.P. 


(17) 


4.3.1.  Remark:  From  (17),  notice  that  0<ak  =  /<kXk/(l  -  Xk  +  //kXk)  <1,  a.s.P,  as  it 
should. 
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4.3.2.  Remark:  Following  Lemma  4.1,  we  noticed  that  the  positive  martingale  of  that 
Lemma  had  the  property  that  E(Ln)  =  1.  It  follows  that  the  measure  P  defined  above 
is  a  probability  measure.  If  we  let  EqY  denote  the  expectation  of  Y  relative  to  the  pro¬ 
bability  measure  Q,  then  Bucy’s  Lemma  (Bremaud  [1981,  pl71])  allows  us  to  write 

EP(Xn  |  Fn.1)EP(Ln  |  Fn_,)  =  Ep(LnXn  |  Fn  ,). 

Hence,  writing  E  for  Ep, 

»„  =  Ep(X„  I  F„_,)  =  ^„-'E(/<nX- 1  F„.,), 

since 

FWU  =  Ln.,E(;. X- 1  F„  ,), 

IO(l.„Xn  |  F„  ,)  =  l.„-,E(fiX"  |  F,  ,)  =  Ln  ,V’n, 

and  Ln_j  >  0.  The  conclusion  of  the  Lemma  then  follows  by  recalling  that  p  is  F- 
.  X 

previsible  and  noting  that  pn  "Xn  =  /inXn,  (Xn  takes  only  the  values  0  and  1). 

4.3.3.  Remark:  It  follows  immediately  that  a  =  1  iff  X  =  1  and  a  =  0  iff  X  ==  0, 
which  is  useful  in  attempting  to  solve  (17)  for  /i  in  Section  5. 

4.3.4.  Remark:_In  the  last  Lemma  we  used  the  positive  martingale  L  to  define  a  proba¬ 
bility  measure  P  which  was  absolutely  continuous  relative  to  P.  In  the  next  Lemma  we 
give  a  “converse”  of  that  result.  For  this  purpose  we  take  Fk  =  FkN:  =  <r(N,,  *  •  •  ,Nk), 
where  N  is  a  Dpp  with  (FN,P)  intensity  X. 


4.4.  Lemma: 

Let  P  and  P  be  a  probability  measures  on  (n.F^),  F^  =  <r(  kN),  an d  suppose  that  P  is 

k 

absolutely  continuous  with  respect  to  P, 

P  «  P. 

Let  Pn  and  Pn  be  the  restrictions  of  P  and  P,  respectively,  to  FnN,  and  define 

dPn 


L"  :  dP„  ’ 


(18) 


the  Radon-Nikodym  derivative  o/P  relative  to  P.  Then  there  exists  a  positive,  FN-  prev¬ 
isible  process  (/ik)  such  that 


Xk 

L-  =  n-^ 


(19) 


where,  as  before,  ^k  =  1  +  Xk(/ik-l),  and  so  N  is  an  (FN,P)  Dpp  with  (FN,P)  intensity 

«k  =  H/»k/4,  (20) 

for  all  keZ+. 
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4.4.1.  Remark:  The  general  idea  of  the  proof  of  this  Lemma  is  to  note  that  L,  as 
defined  in  (18),  is  an  (FN,P)  -  martingale  relative  to  the  filtration  generated  by  the 
discrete  point  process.  This  fact  can  be  used  to  write  L  in  the  form  of  equation  (15)  in 
such  a  way  that  g  in  this  equation  is  previsible.  Lemma  4.2,  with  /i  defined  through  (13) 
and  g,  then  applies  and  L  has  the  required  product  representation  (19).  The  form  of  the 
new  point  process  intensity  then  follows  from  Lemma  4.3. 

5.  Rost's  Theorem  for  Discrete  Point  Processes:  Let  F  be  the  filtration  used  in 
the  last  Lemma,  FnN  =  <r(Nk,k<n). 

Let  P  and  P  satisfy  the  assumptions  of  Lemma  4.4  and  define  L  =  (Ln,FnN,P)  as  in 
equation  (18).  Then  this  Lemma  together  with  Lemma  4.2  says  that  L  satisfies  the  fol¬ 
lowing  transform  (“integral”)  equation 

Ln  =  l  +  (L.fln,  (21) 

where  £n  =  (g.m)n  is  an  (FnN,P)  martingale  by  1°  of  Section  2,  since  m  =  N  -  A  as 
defined  in  equation  (10)  is  such  a  martingale  and  g  given  by  (13)  is  FN  previsible. 

The  variation  of  the  two  probability  measures  P  and  P  is  connected  to  the  process  L 
through 

PV(A)  -  PV(A)  =  /  (1  -  Lv)  dP,  (22) 

A 

for  all  AcFvN,  where  v  is  some  fixed  positive  integer.  Since  Pv  and  Pv  are  the  restrictions 
of  P  to  Fvn  and  FVN  C  F^  we  can  drop  the  subscripts  on  the  left  side  of  (22). 

Roussas  [1972]  shows  that  the  total  variation,  Varv(P,P)  :=  Var(Pv,Pv),  between  Pv 
and  Pv  (or  between  P  and  P  on  FVN)  defined  by 

Varv(P,P)  :=  sup{  |  P(A)  -  P(A)  |  :  A<FVN  } 

satisfies 

Varv(P,P)  =  E((l  Lv)1[l(><,|),  (23) 

where  the  expectation  on  the  right,  is  with  respect  to  the  probability  P. 

Now  we  follow  along  the  lines  of  Rost’s  proof  to  obtain  a  bound  on  Varv(P,P).  The 
form  of  his  bound  is  of  course  different  from  the  one  that  will  be  obtained  here  since  his 
compensator  is  absolutely  continuous  relative  to  Lebcsgue  measure. 

To  obtain  a  bound  on  the  left  member  of  equation  (23),  we  will  decompose  the  right 
member  into  two  parts  in  such  a  way  that  one  part  is  small  and  the  other  is  small  only 
when  the  (FN,P)-intensity  a  and  the  (FN,P)-intensity  X  are,  in  some  sense,  close. 

Since  L0  =  1  and  Lv>0  on  0,  we  decompose  the  event  [LV<1]  into  the  union  of  two 
disjoint  events:  [1  -  £<LV<1]  and  [0<LV<1  -  c].  On  the  first  event,  1  -  Lv  is  between 
0  and  £  and  on  the  second  it  is  between  £  and  1.  It  follows  from  (23)  that 

Varv(P,P)  <  £  +  P(£<l  LV<1).  (24) 
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Let 


S  =  inf{kcZ+  :  k<v  and  |  i  Lk  |  >  c  },  (25) 

if  {  •  •  •  }  jk  <f>  and  equal  v  otherwise.  Then 

P(€<1  -  LV<1)  <  P(  maxk<v  |  1  -  Lk  |  >0  <  P(e  <  |  1  -  Ls  | ) 

<  -Le((i-ls)2>  =  4E«L-el),  <20) 

the  last  equality  being  due  to  (21),  Using  (3°)  of  Section  2,  we  have 

E((L.f)s2)  =  E(L2.<«>s),  (27) 

where 

A<£,£>k  =  A<g.m,g.m>k  =  gk2A<m,m>k  =  gk2Xk(l  -  Xk).  (28) 
In  order  to  obtain  the  last  equality  from  A<m,m>k  refer  to  Section  2.  Then 
E((Xk-Xk)2|Fkl)  =  E(Xk2  |  Fk_,) -2XkE(Xk  |  IV.)  +  Xk2 
=Xk(l  -  Xk), 

since  X2  =  Xk.  From  equations  (25)  through  (28),  we  find  that 

p(«<i  -  l(<i)  <  JLbsW.,^.  -  xk) 

<  (iJ-l)  E£gk2Xk(l  -  \k),  (29) 

where  we  have  used  the  definition  of  the  stopping  time  S  which  provides  Lk_j(w)<l+f, 
for  k  going  from  1  to  S(w).  Using  Remark  4.3.3.  and  equations  (13)  and  (17),  one  can 
show  that 


gk2Xk(l  -  Xk) 


0  ,  on  [Xk  =  0  or  Xk  =  1) 

K  Xk)2/Xk(l  -  Xk)  ,  on  (  0<Xk<  1  1 


(30) 


Therefore,  since  P(S  <  v)  =  1  and  the  quantities  in  (30)  are  non-negative,  v  e  can 
replace  S  in  the  expectation  on  the  right  of  (29)  by  v  to  obtain 

5.1.  ^Theorem: 

Let  P  and  P  be  probability  measures  on  the  measure  space  (tyF^)  with  P«P.  Let  t  be 
any  real  number  such  that  0<e<l.  If  N  is  a  discrete  point  process  with  an  FN -intensity 
a  relative  to  P  and  an  F^-intensity  X  relative  to  P,  then 

Varv(P,P)  <  e  +  (i±i.)  EEg„2Xk(l  -  Xk),  (31) 

where  the  summands  on  the  right  satisfy  (SO). 
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5.1.1.  Remark:  Recall  that  the  expectation, 

Cv  :=  E£gk2Xk(l-Xk), 

l 

on  the  right  of  (31)  does  not  depend  on  t  and  that  there  are  no  constraints  on  i  other 
than  it  is  in  the  open  interval  (0,1).  So,  if  the  non-negative  quantity  Cv  is  less  than  1, 
we  can  choose  t  =  Cy/3.  Then 

f  +  CV  =  Cy'/3  +  (1  +  Cy1/3)2^/3  <  5Cy‘/3.  (32) 

Therefore,  (31)  and  (32)  yield  the  following 

5.2.  Theorem: 

l7nder  the  assumptions  of  Theorem  5.1, 

Varv(P,P)  <  SCy*/3.  (33) 

5.2.  l._  Remark:  Notice  that  since  (23)  holds  and  Lv  >  0,  a.s.P,  we  always  have 
Varv(P,P)  <  1.  Hence,  (33)  is  trivial  when  Cv  is  not  less  than  1. 

5.2.2.  Remark:  Only  the  form  of  Cv  differs  from  Rost’s  result. 

5.2.3.  Remark:  The  bound  in  (31)  or  (33)  differs  considerably  from  the  usual  Lj-bound 
discussed  in  Kabanov,  Liptser  and  Shiryaev  [1083]  and  Serfling  [1978],  Lemma  6.1. 

5.2.4.  Remark:  The  following  example  is  due  to  Rost.  It  illustrates  the  use  of  (31)  or 
(33).  Suppose  that  ak  =  av^,  l<k<v,  Xk  =  X,  a  constant,  and 
|  ork  —  X  |  =  0(v-6)  where  1>6>.5.  Then 

0<EESk2*k(«  -  Xk)  <  C(l/vM  '|-0, 

1 

as  v— »oo  and  so  Varv(P,P)-+0. 

By  contrast,  under  these  assumptions  an  Lj-bound  on  the  total  variation  would  be 
unbounded.  This  doesn’t  mean  that  either  type  of  bound  is  better  or  worse  than  the 
other,  just  different. 
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ABSTRACT.  We  survey  some  estimation  problems  in  the  area  of  life 
history  analysis.  The  problems  we  describe  involve  estimating  an  arbitrary 
life-distribution  and  the  transition  probabilities  of  a  Markov  chain.  In  our 
discussion  we  emphasize  the  role  of  the  observation  scheme,  for  example 
survival  testing  versus  renewal  testing,  and  the  role  of  the  product-limit 
estimator.  In  this  connection  we  demonstrate  the  need  for  the  family  ^of 
Poisson  type  counting  processes  in  developing  a  unified  methodology  fo f r 
solving  these  problems. 

1.  INTRODUCTION.  Many  areas  of  science  such  as  demography,  medicine, 
industrial  reliability  and  epidemiology  give  rise  to  phenomena  involving  life 
histories  whose  description  characterize  a  family  of  stochastic  processes. 

Our  interest  lies,  in  particular,  in  problems  where  individual  life  histories 
are  viewed  as  realizations  of  a  stochastic  process  moving  among  states  in  a 
discrete  state  space  (pure  jump  processes).  For  such  processes  the  states 
denote  the  status  of  an  individual  (insurance  policy,  technical  component, 
etc.)  and  transitions  between  states  denote  events  of  interest. 

To  fix  ideas  consider  a  problem  in  epidemiology  where  one  studies  the 
relationship  between  a  particular  exposure  and  the  incidence  of  any  disease 
that  may  develop.  For  example,  healthy  individuals  may  be  initially 
classified  with  regard  to  cigarette  exposure  and  followed  forward  in  time  to 
determine  which  of  heart  disease  or  lung  cancer  develops.  Thus  "health," 
"heart  disease"  and  "lung  cancer"  are  three  states  in  an  individual's  life 
history  and  an  event  occurs  when  the  individual  moves  from  a  healthy  state  to 
one  of  the  diseased  states.  This  is  the  subject  of  cohort  analysis  where  the 
object  of  interest  is  the  effect  of  exposure  on  rate  of  disease  incidence 
(see  Breslow  (1985)). 

The  example  above  highlights  a  salient  feature  of  problems  in  the  area 
of  life  history  analysis.  That  is,  individual  life  histories  are  influenced 
by  the  presence  of  auxiliary  processes,  such  as  cigarette  exposure,  which  are 
seen  to  effect  the  rate  at  which  events  occur.  A  similar  motivation  lies 


Key  words:  Life-testing,  Markov  chains,  censoring,  product-limit  estimator, 
martingale,  Poisson  type  counting  process. 
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behind  some  random  shock  and  wear  models  of  system  reliability  where,  in 
addition  to  system  age,  the  hazard  rate  of  time  to  failure  depends  on 
environmental  stresses.  In  recent  years  statistical  analysis  for  such 
dynamical  phenomena  have  relied  on  counting  processes  and  their 
compensators.  A  germinal  paper  in  this  regard  is  Aalen  (1978)  who  first 
introduced  the  multiplicative  intensity  model  to  various  life  history 
problems  such  as  survival  analysis.  Since  then  a  great  deal  of  activity  in 
this  area  has  taken  place  and  an  excellent  exposition  of  the  role  of  counting 
processes  in  life  history  analysis  is  to  be  found  in  Andersen  and  Borgan 
(1985). 

Counting  process  models  based  on  intensities  generalize  the  Poisson 
process  as  a  model  for  random  events  in  time.  These  models  assume  the 
natural  time  parameter  to  be  continuous.  However,  some  phenomena  such  as 
consumer  loan  repayment  behavior  naturally  occur  in  discrete-time.  Moreover, 
longitudinal  data  sets  in  sociology  often  arise  from  panel  designs  which 
generate  observed  stochastic  processes  with  discrete-time  parameter.  In  our 
work  we  have  found  it  useful  to  consider  a  family  of  counting  processes 
which,  in  the  terminology  of  Liptser  and  Shiryaev  (1978),  we  call  Poisson 
type  counting  processes.  These  counting  processes  are  characterized  in 
section2  by  their  compensators  whose  pathwise  Radon-Nikodym  derivative 
relative  to  a  fixed  Borel  measure  is  an  observable  predictable  process.  The 
model  generalizes  the  multiplicative  intensity  and  allows  for  a  unified 
treatment  of  mixed  discrete  and  continuous-time  problems.  We  describe 
several  examples  including  survival  analysis  with  arbitrary  distribution 
measure  and  Markov  chains.  In  statistical  applications  each  example  gives 
rise  to  an  estimation  problem  which  we  reduce  to  that  of  estimating  the  Borel 
measure  mentioned  above.  In  sections  3  and  4  we  survey  some  recent  results 
in  this  area  which  rely  on  the  martingale  dynamics  over  point  processes  as 
discussed,  for  example,  by  Liptser  and  Shiryaev  (1978),  Jacod  (1975)  and  Boel 
et  al.  (1975). 

2.  POISSON  TYPE  COUNTING  PROCESSES.  We  define  the  family  of  Poisson 
type  counting  processes  and  give  a  number  of  worked  examples.  Let  (ft,3,P) 
denote  a  probability  space  and  F  =  $3t,t  £  0$  a  given  family  of  sub-o- 

algebras  of  3  where  F  is  nondecreasing,  right-continuous  and  complete 
relative  to  P.  All  of  the  standard  terminology  used  below,  such  as  adapted, 
predictable,  and  compensator,  are  defined  in,  for  example,  M6tevier  (1982), 
and  Liptser  and  Shiryaev  (1978). 

Let  N  =  |N^,^t,t  £  0(  denote  a  counting  process  defined  on  (ft,3,P) 

so  that  the  sample  paths  of  N  are  right  continuous  step  functions  with 
jumps  of  size  +1,  Nq  =  0  and  is  a  ^-measurable  random  variable.  Let 

B  denote  a  fixed  Borel  measure  over  the  Borel  sets  in  R+  =  [0,®)  and  let 

Y  =  jY|..^t*t  £  0*  denote  a  nonnegative  predictable  process. 

Definition  2.1.  If  N  has  compensator  A  =  jAt,3^,t  £  0(  relative  to  F 
given  by 


A 


t 


YaB*dsS. 

(0,t] 
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then  we  say  N  is  a  Poisson  type  counting  process.! 


By  definition  the  process  M  =  -  A^,3^,t  £  Of  is  a  local  square 

integrable  martingale  and,  in  the  terminology  of  Jacod  (1975),  the  kernel 
A| dt |  is  the  dual  predictable  projection  to  N  and  is  unique  to  within 
stochastic  equivalence.  If  Y^  =  X  >  0  is  constant  for  all  t  £  0  and  B 

denotes  Lebesgue  measure,  then  A^  =  Xt  and  N  is  a  simple  Poisson  point 

process.  More  generally,  if  B  is  absolutely  continuous  relative  to 
Lebesque  measure  p,  then  definition  2.1  gives  rise  to  the  multiplicative 
intensity  model  where  the  intensity  is  given  by  YdB/dp. 

In  life  history  analysis  problems  involving  an  event  of  a  single  type 
N^.  denotes  the  number  of  occurrences  of  this  event  over  (0, t]  and  A^. 

denotes  the  cumulative  conditional  rate  of  event  occurrence  over  (0, t ] .  The 
actual  composition  of  the  conditional  rate  depends  explicitly  on  the 
underlying  filtration  F,  often  called  a  history,  so  that  specification  of 
the  relative  richness  of  F  is  important.  In  applications  Y  plays  the 
role  of  auxiliary  process  which  may  be  some  measure  of  environmental  exposure 
or  censoring.  For  example,  in  epidemiology  Y  might  denote  a  measure  of 
cigarette  exposure. 

Consider  the  following  examples  of  Poisson  type  counting  processes. 

Example  2.1.  Survival  analysis  with  censored  data.  For  each  n  £  1 
let  Xj  and  IL,  i  =  l,...,n  denote  2n  independent  positive  random 

variables  defined  on  a  probability  space  (fl,7,P)  with  Xj  or  U.  almost 

surely  finite  for  each  i.  X^  has  distribution  measure  G  and  IL  has 

distribution  measure  H.  The  observable  random  variables  X.  and  6^  are 

given  by  =  min(X^.lK)  and  =  1  (X^  £  1L),  where  1(A)  is  the 

indicator  function  of  event  A.  In  applications  X^  denotes  the  survival 

time  and  IL  denotes  a  censoring  time  so  that  this  is  a  model  for  random 

right  censorship. 

A  history  F  =  j^,t  £  Of  will  have  to  record  the  progress  in  the 

lifetimes  of  the  individuals  or  components  under  test.  In  this  case  the 
natural  history  is  given  by 


=  3q  v  o-(X.  £  s,6jl(5T  £  t),  s  £  t,i  =  l,...,n) 

where  contains  the  P-null  sets  of  3  and  their  subsets. 

In  the  counting  process  formulation  of  this  model  for  each  i  =  1 . n 

we  define  N(i)  =  $1(X.  £  t,6.  =  1),  t  £  Of  and  Y(i)  =  $1  (X.  £  t,t  £  0)f 

and  let  B  denote  the  Borel  measure  generated  by  the  function 
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B(t)  = 


,  t  *  0 


[  (l-G(8-))_1GH3j 

(O.t] 


where  G(s-)  =  lim  G(s).  Clearly  the  process  N(i)  is  a  counting  process 

vl  |  3 

equal  to  zero  until  the  ith  survival  time  elapses  and  has  not  been  censored 
while  Y(i)  is  called  a  risk  process  and  is  equal  to  one  as  long  as  the  ith 
unit  remains  alive  or  at  risk  to  failure. 

According  to  theorem  3.1.1,  Gill  (1980)  N(i)  has  compensator 
A(i)  =  )At(i),t  £  0?  relative  to  F  given  by 


t 

At(i)  =  J  1(X.  *  s)B|ds( . 
0 


Since  Y(i)  is  a  left-continuous  process  it  is  predictable  (see  for  example 
Br6maud  (1981))  so  that  A(i)  satisfies  definition  2.1  and  N(i)  is  a 
Poisson  type  counting  process.  To  prove  that  A(i)  is  the  compensator  to 
N(i)  one  can  verify  the  martingale  property  directly  using  properties  of 
conditional  distribution*  S> 

In  the  example  above  the  function  B(t),  t  £  0  may  be  interpreted  as 
the  cumulative  age-specific  mortality  rate  for  an  average  or  baseline 
individual.  If  the  sampling  population  is  homogeneous  with  respect  to 
mortality,  this  is  a  reasonable  model  of  failure  rate.  On  the  other  hand, 
for  heterogeneous  sampling  populations  it  is  preferable  to  model  failure  rate 
as  a  function  of  an  auxiliary  random  variable. 

Example  2.2.  Failure  rate  as  a  function  of  a  random  variable.  Let 
(9,3,P)  denote  a  probability  space  on  which  two  positive  random  variables 
X  and  Z  are  defined.  The  random  variable  X  models  survival  time 
whereas  Z  denotes  a  measure  of  a  characteristic  of  the  unit  of  observation 
from  a  heterogeneous  population  or  some  environmental  exposure.  The  effect 
o<'  Z  on  failure  rate  is  modeled  as  follows.  Let  G(*;Z)  denote  the 
corditional  distribution  of  X  given  Z  so  that 

G(t;Z)  =  P(X  £  t|Z)  =  1  -  expJ-ZB(t)  l  ,  t  £  0 


where  B  is  a  continuous  function  and  denotes  the  cumulative  age-specific 
mortality  rate  for  a  baseline  individual  (i.e.  Z  =  1).  If  B  admits  a 
density  h  relative  to  Lebesgue  measure,  then  given  the  event  $X  £  t(  and 
Z  the  conditional  failure  rate  at  time  t  is  given  by 

h(t;Z)  =  G'(t;Z)/(l  -  G(t;Z))  =  Zh(t)  ,  t  ;>  0. 


The  reader  will  recognize  the  conditional  rate  above  as  the  landmark 
proportional  hazards  model  introduced  to  incorporate  heterogeneity  in  life- 


200 


testing  problems  by  Cox  (1972).  The  function  h  is  the  baseline  hazard  and 
Z  is  called  a  covariate. 

Let  N  =  $1(X  £  t),t  £  o;  denote  the  survival  time  counting  process 
and  P  *  |3t,t  2  0$  be  the  history  defined  by 

3^  =  o(Z)  v  cr(X  <;  s,s  £  t). 


Then  according  to  the  model  above  N  has  compensator  A  =  $A^,t  ;>  0$ 
relative  to  F  given  by 


A 


t 


t 

J  Z1(X  £  s)B$ds£. 
0 


We  identify  Y  =  Z1(X  £  s)  in  2.1  and  obtain  that  N  is  a  Poisson  type 
s 

counting  process. 

Let  G  denote  the  unconditional  distribution  measure  of  the  random 
variable  X  so  that 


S(t)  =  P(X  £  t)  =  1  -  exp$-5(t)$  ,  t  £  0 


where  B  is  some  nonnegative  continuous  function  which  uniquely  determines 
G.  We  exploit  the  martingale  approach  to  survival  analysis  to  determine  (5 
by  using  the  innovation  theorem  (see  for  example  Aalen  (1978))  to  determine 
B,  by  exchange  of  history.  The  problem  may  be  reformulated  as  follows.  Let 
Fa  =  $3£,t  £  0$  denote  the  internal  history  to  N  given  by 


3^  =  a(x  ^  s,s  £  t). 

Note  that  for  each  t  £  0  3^  C  3^  and  consider  the  problem  of  determining 
the  compensator  A  =  $Aj.,t  £  0$  to  N  relative  the  F*.  By  theorem  18.3, 
Liptser  and  Shiryaev  (1978)  the  compensator  A  is  given  by 

t 

A^  *  J  z(s)l(X  £  s)B?ds$ 

0 

where  z(s)  =  E(Z|X  >  s)  so  that  E  is  given  by  JzdB^  We  identify 

Y  =  1(X  £  s)  and  E  with  B  in  2.1  to  obtain  that  A  is  of  the  Poisson 
s 
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type.  Note  that  the  change  of  history  from  F  to  F  leaves  N  in  the 
family  of  Poisson  type  counting  processes,  this  is  a  general  result. 

In  some  models  of  heterogeneity  the  conditional  expectation  z  is  easy 
to  calculate.  For  example  suppose  Yg  is  a  Gaussian  random  variable  with 
mean  p  and  variance  a  .  If  Z  =  Y  ,  then  a  direct  argument  using  Bayes 
rule  given  in  Yashin  (1985)  shows  that  the  conditional  density  of  Y  given 
the  event  |X  >.t£  is  a2Gaussian  density  with  mean  and  variance  parameter 
p[2a  h(t)  +  1]  and  a  [2ap(t)2+  1]  ,  respectively.  From  this  fact  z 

is  easily  calculated  to  be  atf[2<rh(t)  +  1]  1  +  p  [2a  n(t)  +  1]  .■ 

Problems  in  survival  analysis  generate  a  counting  process  which  counts  a 
single  event:  transition  from  an  initial  state  to  the  state  "failure."  In 
general,  a  univariate  counting  process,  such  as  a  renewal  counting  process, 
counts  the  repeated  occurrences  of  a  single  event  over  time.  Often  problems 
in  life  history  analysis  involve  multiple  types  of  events  occurring  over  time 
so  that  univariate  counting  processes  are  not  sufficiently  general  for  their 
study.  For  example,  Markov  chains  are  widely  used  in  demography  (e.g.  Hoem 
(1971)),  econometrics  (e.g.  Singer  (1981)),  and  illness-death  models  (e.g. 

Mau  (1986))  of  broad  appeal  in  insurance  and  medicine.  The  transitions  among 
the  states  of  a  Markov  chain  may  be  viewed  as  events  of  different  types; 
their  being  one  event  associated  with  each  possible  pairwise  transition  among 
states  of  the  chain.  This  demands  a  multivariate  counting  process  which  is 
essentially  a  collection  of  univariate  counting  process  where  each  member  of 
the  collection  is  associated  with  a  particular  pairwise  interstate 
transition. 

Example  2.3.  Non-homogcneous  Markov  chains.  We  give  a  unified 
treatment  to  discrete  and  continuous-time  Markov  chains.  Let 
X  =  JX^.t  2  0$  denote  a  Markov  chain  with  finite  state  space  E,  defined  on 

a  probability  space  (fi,3,P).  Assume  that  the  sample  paths  of  X  are  right- 

continuous  with  left-hand  limits.  If  X  is  a  discrete-time  chain,  then  X 

is  derived  from  a  Markov  chain  JY  ,n  2  0$,  say,  by  putting  X.  =  Y  for 

y  v  **  v  n 

n  £  t  <  n  +  1.  Let  F  =  $7£,t  2  0?  denote  the  internal  history  of  X 
given  by 

^  s  *> 


where  3n  contains  the  P-null  set  of  7  and  their  subsets.  Observe  that  in 

X 

|-tj,  where  [t]  denotes  largest  integer  in 
t. 

If  X  is  a  continuous-time  Markov  chain,  then  we  assume  that  X  admits 
the  Q-matrix  or  intensity  Q(t)  =  (qij(t),i,j  E  E)  such  that  for  all  t  2  0 

and  i  2  j  E  E 

t 

qAj(t)  >  0,  qA-(t)  <  0,  I  q.j(t)  e  0  and  J  qij(s)ds  <  ®. 

j  J 


the  discrete-time  case 
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In  this  case  the  transition  probabilities  P(s,t)  are  given  by  the  product 
integral 

P(s,t)  =  IT  (I  +  Q(u)p|duD  0  £  s  £  t  <  ®, 

(s,t] 


where  p  denotes  Lebesque  measure;  see  e.g.  Aalen  and  Johansen  (1978). 

Alternatively,  if  X  is  a  discrete-time  Markov  chain,  then  we  assume 
that  for  each  integer  n  £  0  X  admits  the  transition  probability  matrix 

Pn  =  (p.  .(n))  such  that  for  i,j  6  E 
•  J 


Pjj(n)  -  P0t„+i  =  j|X„  =  0- 

Using  the  same  notation  we  define  the  discrete  analog  of  the  Q-matrix  as 
follows.  For  each  n  ^  0  and  i  *  j  E  E  let  q^(t)  =  p^(n)  if 

n  £  t  <  n  +  1,  qu(t)  =  -E^.q.jCt)  and  Q(t)  =  (q.j(t),i,j  6  E).  Then  it 

is  easy  to  verify  that  the  transition  probabilities  P(s,t)  for  the  discrete¬ 
time  chain  are  given  by 

P(s,t)  =  TT  (I  +  Q(n))  -  TT  (I  +  Q(u)p$duD  0  £  s  £  t  <  ® 
s<n£t  s<u£t 


where  p  denotes  counting  measure  with  support  $0,1,2,...£.  If  the  product 
is  empty  it  is  defined  to  be  I. 

Fix  i  *  j  G  E  and  define  N(i,j)  =  | ( i , j ) , t  £  0(  to  be  a  random 

process  which  counts  the  number  of  direct  transition  from  i  into  j  for 
the  Markov  chain  X.  Thus  for  t  £  0 

N. (i.  j)  =  Z  ltt  =  j,X„  =  i) 

1  0<s£t  8  8 


where  X  =  lim  X  .  Our  object  is  to  show  that  N(i,j)  is  a  Poisson  type 

8“  UTS  U  Y 

counting  process  with  compensator  A(i,j)  =  JA^( i , j) ,t  £  0$  relative  to  FA 
given  by 


At(i.j)  =  J  1(X8_ 

(0,t] 


i)qij(s)Mlds| 


where  p  is  Lebesque  measure  for  continuous-time  chains  and  counting  measure 
for  discrete-time  chains. 

By  virtue  of  the  L6vy  formula  (see  e.g.  BrAmaud  (1981))  it  follows  that 
for  any  0  £  s  £  t 
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E(Kt(i,j)  -  »,(i.j)|^)  -  E 


*  *«•  ■  j-V  -  *)|<KX.) 

S<U£t 


rt 


[  1(XU_  =  OqjjCuJjiJduH^CX,) 

3 

=  E(At(i,j)  -  As(i,j)|^) 


from  which  the  martingale  property  for  N(i,j)  -  A(i,j)  easily  follows. 

Note  that  by  the  Markov  property  conditioning  with  respect  to  Xg  is 

equivalent  to  conditioning  with  respect  to  3^.  Thus  if  we  identify 

s 

Y  =  1(X  _  «  i)  and  B|du|  =  q. .(u)p|du$  in  2.1,  then  N(i,j)  is  a  Poisson 

S3  1 J 

type  counting  process.  The  proof  of  this  without  using  the  L6vy  formula  may 
be  made  directly  in  the  discrete-time  case,  and  Aalen  and  Johansen  (1978) 
give  an  alternative  proof  for  the  continuous-time  case. I 

Consider  a  life-testing  situation  in  which  at  time  zero  a  component  is 
put  on  test  and  upon  failure  is  immediately  replaced  with  an  identical 
component  and  so  on.  The  lifetimes  generated  by  this  test  procedure  may  be 
modeled  as  an  ordinary  renewal  process  and  applications  can  be  found  in 
industrial  life-testing  and  animal  experimentation. 

Example  2.4.  Renewal  testing.  Let  S  =  fSn,n  £  0|  denote  a  renewal 

process  induced  by  the  arbitrary  distribution  measure  G.  Define  the  renewal 
counting  process  =  Jn^.t  £  0$  given  by 


-  I  1(S  <;  t) 

1  n=l  n 


Define  the  history  F  =  J3t»t  £  0|  as  follows.  For  t  £  0 

=  <KSn  £  3,3  £  t,  n  £  1) 


and  consider  the  problem  of  finding  the  compensator  A'=  jAt,t  £  0$  to  n 
relative  to  F. 

For  each  n  £  1  let  XR  =  SR  -  (Sq  =  0)  so  that  X  =  $Xn,n  £  1$ 

is  a  sequence  of  independent  random  variables  each  with  distribution  G. 
Thus  the  conditional  distribution  of  S  given  3  (see  Br6maud  (1981) 

Sn-i 

for  a  definition)  is  given  by  Gn  where 
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Gn(t)  =  P(Sn  <:  t|3g  )  -  G(t  -  ,  t  *  sn_r 


Hence  according  to  proposition  3.1,  Jacod  (1975)  the  compensator  A  is  given 
by 


f  it  lYf'i 


where  L(t)  =  t  -  S  is  a  left-continuous  version  of  the  backwards  renewal 

nt- 

time.  Therefore  because  of  the  presence  of  the  L-process  with  its  well  known 
saw  toothed  sample  paths  n  is  not  in  general  a  Poisson  type  counting 
process. 

The  regenerative  structure  of  the  compensator  A  suggests  that  the 
family  of  counting  processes  generated  by  the  interrenewal  times  $Xn,n  £  1$ 

are  of  the  Poisson  type.  For  each  n  £  1  define  a  counting  process 

N(n)  =  1 1(X  £  t),t  £  0$  and  the  history  H  =  \U.  ,t  £  0$  by 
n  v 

«t  =  a(Xn  <;  s,s  £  t.n  £  1) 


Then  by  virtue  of  the  independence  of  the  Xr  it  is  possible  to  show  directly 
that  N(n)  has  compensator  A(n)  =  )At(n),t  £  0$  relative  to  H  given  by 


t 

At(n)  =  J  l(Xn  ;>  s) 
0 


Cjdsj 

l-G(s-) 


which  is  clearly  of  the  Poisson  type.  This  may  be  proved  either  by  an  appeal 
to  proposition  3.1,  Jacod  (1975),  by  virtue  of  example  2.1,  or  by  directly 
verifying  the  martingale  property.  This  representation  is  called  the  sojourn¬ 
time  approach  by  Phelan  (1986b)  who  applies  an  extension  of  it  to  problems  of 
inference  from  Markov  renewal  processes.! 

Thus  far  only  example  2.1  involved  censoring.  In  practice  censored 
processes  or  incomplete  observations  are  the  rule  rather  than  the  exception. 
Therefore  one  yardstick  of  the  utility  of  a  given  probability  model  at 
analyzing  life  history  data  is  its  ability  to  incorporate  general  patterns  of 
censoring.  Poisson  type  counting  processes  meet  this  demand  as  is 
illustrated  below. 
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Example  2.5.  Censored  processes.  Let  N  =  $Nt,?t»t  £  0|  denote  a 

Poisson  type  countine  process  with  compensator  A  =  jA^.,^,t  £  0$  where 

F  =  J3^,t  £  0$  is  a  jiven  history.  Let  C  =  $Cj.,3^,t  £  0£  be  a  J0,l|-valued 

predictable  process  used  to  model  the  censoring.  Thus  the  counting  process 
N  is  observable  only  on  the  set  $t:  C^=  1$  otherwise  we  say  the  process  is 

censored.  This  implies  that  the  observable  counting  process  N  =  | , t  £  0£ 

is  given  by  the  pathwise  Stieltjes  integral 


= 


0<s£t 


C  AN 
s  s 


■J 


C  dN 
s  s 


which  is  often  called  the  censored  counting  process.  Since  C  is  bounded 
and  predictable,  by  the  theory  of  stochastic  integration  with  respect  to 
counting  process  martingales  (see  e.g.  Liptser  and  Shiryaer  (1978))  the 
process  defined  by  the  pathwise  Stieltjes  integral 

t 

at  *  |  c,<dNs  -  “»>  •  1  * 0 

0 

is  a  ^-(local)  martingale.  Hence  Sf  has  compensator  A  =  |At,t  £  0$ 
relative  to  F  given  by 

t  t 

\  -  J  C,<“,  *  J  W*d’* 

0  0 

where  Y  =  )Y^,t  £  0$  and  3  are  defined  by  2.1.  From  this  expression  it  is 

evident  that  H  is  a  Poisson  type  counting  with  auxiliary  process 
$C^Y^,t  ^  0$  and  Borel  measure  B  which  it  inherits  from  N.l 


An  example  of  a  left-continuous  (hence  predictable)  censoring  process 
=  1(U.  £  t)$  was  given  in  example  2.1.  Censoring  of  Markov  chains  is 

considered  by  Aalen  and  Johansen  (1978)  and  Phelan  (1986c)  for  models  in 
continuous  and  discrete-time,  respectively,  and  for  the  renewal  process  of 
example  2.4  by  Phelan  (1986a).  A  general  discussion  of  censoring  is  found  in 
Gill  (1980),  and  in  the  context  of  nonparametric  tests  for  comparison  of 
counting  processes  in  Andersen  et  al.  (1982). 

One  can  construct  numerous  other  examples  of  Poisson  type  counting 
processes.  For  example,  Brtmaud  (1981)  constructs  a  G/M/l  Queue  using 
Poisson  counting  processes.  His  departure  process  (see  page  37)  gives  an 

206 


example  of  a  censored  homogeneous  Poisson  process  and  is  therefore  of  the 
Poisson  type.  Although  we  have  not  done  so  it  would  be  of  interest  to  survey 
stochastic  models  of  natural  phenomena  which  generate  Poisson  type  counting 
processes.  Some  examples  that  we  are  aware  of  include  models  for  the  mating 
behavior  of  fruit  flies  (Aalen  (1978)),  labor-force  dynamics  (Andersen 
(1985))  and  screening  carcinogenic  chemicals  in  animal  experiments  (Nau 
(1986)). 

In  practical  problems  the  measure  B  is  unknown  and  requires 
estimation.  The  estimation  of  B  is  usually  based  on  observations  of  the 
bivariate  process  (N,Y)  over  a  period  of  time.  A  general  solution  to  this 
problem  involves  an  empirical  process  called  the  martingale  estimator  of  B. 
This  estimation  procedure  is  presented  next  and  is  applied  in  section  4  to 
solve  some  estimation  problems  drawn  from  the  models  developed  above. 

3.  ESTIMATION  FROM  POISSON  TYPE  COUNTING  PROCESSES.  Let  N  = 

^t'^t’t  ^  denote  a  Poisson  type  counting  process  with  compensator 

A  =  $Aj.,3^,t  £  0$,  auxiliary  process  Y  =  |Yj.,3^,t  £  0$  and  measure  B. 

Definition  2.1  is  extended  in  the  following  way.  Let  J  =  \t:  AB(t)  >  0$, 
where  AB(t)  =  B(t)  -  B(t-),  be  the  countable  set  to  which  B  assigns 
positive  mass.  If  J  is  nonempty,  then  for  each  t  E  J  we  allow 
AN(t)  >  1  with  positive  probability.  This  extension  is  used  below  where  the 
superposition  of  Poisson  type  counting  processes  has  this  property.  If  the 
process  (N,Y)  is  observable  over  a  period  of  time  and  B  is  unknown,  then 
a  statistical  problem  is  to  estimate  B  from  (N,Y). 

Define  the  predictable  process  Y+  =  $Y*,t  £  0$  given  by 

(3.1)  Y*  =  (Y^)  *l(Yt  >  0)  (0/0  *  0  by  convention) 


and  the  empirical  process  B  =  $Bj.,t  £  0$  given  by  the  Stieltjes  integral 

t 

(3.2)  Bt  =  J  T>,. 

0 


The  process  B  is  the  proposed  estimator  of  B  and  is  called  the  martingale 
estimator  by  virtue  of  the  observation  that  the  process 

t 

Mt  -  J  T>N,  -  “V  -  st  -  5t  •  'i° 

0 

N  f 

is  a  (local)  martingale  where  Bfc  =  Jq1(Y8  >  0)B$ds$.  This  follows,  for 
example,  by  an  appeal  to  theorem  18.7,  Liptser  and  Shiryaev  (1978). 
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The  statistical  theory  of  the  martingale  estimator  is  based  on 
asymptotics.  Suppose  we  are  given  a  sequence  $N  ,Y,n  ;>  l£  of  Poisson  type 
counting  processes  and  their  associated  auxiliary  processes.  For  each  n 
define  Bh  ■  |§j;,t  ;>  0$  from  (Nn,Yn)  according  to  (3.2).  We  consider  the 

ic  (i.e.  as  n  -»  ®)  properties  of  the  sequence  of  estimators 
l£.  Suppose  |a  ,n  £  1$  is  a  sequence  of  positive  numbers  tending 

to  infinity  as  n  tends  to  infinity.  If  Y  / a*  converges  uniformly  to  a 

function  y  in  probability  as  n  ■+  ®,  where  y  is  bounded  from  zero  on 
[0,a],  say,  and  (N,Yn)  is  derived  as  the  sum  of  independent  Poisson  type 
counting  processes,  then  the  following  properties  will  typiolly  hold: 

a.  Consistency.  suPo^t^a  l®t  ~  ®(*)  I  "*  0  in  probability  as  n  ■+  ®; 

b.  Weak  Convergence.  For  n  £  1  define  Yn  =  -  B(t)),t  £  0\, 

n  o 

then  Y  converges  weakly  to  a  Gaussian  process  Y  of  independent 

increments  as  n  -►  ®.  Weak  convergence  takes  place  in  the  space  D([0,a]) 

endowed  with  the  Skorohod  topology  (see  Billingsley  (1968)). 

To  prove  these  results  one  employs  two  fundamental  tools:  an  inequality 
due  to  Lenglart  (1977)  and  functional  central  limit  theorems  for 
semimartingales  as  developed  in  Jacod  et  al.  (1982).  To  see  why  observe  that 
for  each  n  £  1  and  t  £  0 

B*  -  B(t)  =  -  Bj  +  BJ  -  B(t) . 

It  has  already  been  noted  that  Mn  =  )B^  -  B^,t  ;>  0(  is  a  (local) 

martingale,  and  X  =  -  B(t),t  £  0$,  being  the  difference  between  two 

monotone  processes,  is  a  process  of  local  bounded  variation.  Hence  Yn  is  a 
semimartingale  (see  Shiryaev  (1981)).  The  conditions  above  may  be  used  to 
show  directly  that  aRX  converges  to  zero  in  probability  as  n  ®.  In 

this  case  the  Lenglart  inequality  is  applied  to  Mn  to  prove  (a).  Then 
martingale  functional  central  limit  theorems  are  applied  to  anMn  to  prove 

(b).  We  omit  the  details  but  note  that  in  our  work  we  have  found  it 
convenient  to  appeal  to  alternative  criteria  for  tightness  found  in  Jacod  and 
Memin  (1980). 

4.  SURVEY  OF  ESTIMATION  PROBLEMS  .  We  give  a  survey  of  estimation 
problems  and  results  in  the  areas  of  life-testing  and, Markov  chain  analysis. 

We  begin  with  the  problem  of  estimating  an  arbitrary  life-distribution  G  in 
life-testing  models  and  then  consider  the  problem  of  estimating  transition 
probabilities  of  a  Markov  chain.  In  our  discuss'.on  we  emphasize  the 
importance  of  the  observation  scheme,  for  example  survival  testing  versus 
renewal  testing,  and  the  role  of  product-limit  estimators. 

4.1  Estimating  the  life-distribution.  Let  G  denote  an  arbitrary  life- 
distribution  and  for  t  £  0  define  B(t)  =  Jq(1  -  G(s-))  1GjdsJ.  For  the 

problem  of  estimating  G  we  distinguish  among  three  observation  schemes. 


asjrmptot 
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a)  Survival  testing.  For  each  n  £  1  we  observe  n  pairs 

(X^,6^),i  =  1 . n  of  independent  censored  lifetimes  and  their 

associated  censoring  indicator  1  -  In  the  notation  of  example  2.1, 

define  the  aggregate  processes  Nn  =  $N^,t  2  Of  and  Yn  =  $Y^,t  £  Of  by 

n”  =  Z^Nt(i)  and  y”  =  z"l(Xj  £  t),  respectively.  The  statistic  (Nn,Yn) 

is  used  below  to  construct  the  estimator  of  G.l 

b)  Renewal  testing.  A  single  renewal  process  S  =  $Sn,n  £  Of  is 

observed  over  an  expanding  time  horizon  [0,T]^,  T  >  fj|.  In  the  notation  of 
example  2.4,  define  the  aggregate  processes  N1  =  jN^,t  £  0$  and 

YT  -  jY^.t  *  Of  by 


n(T)  n(T) 

Nt  =  I  Nt(n)  and  Yt  =  Z  l(Xn  *  t). 


T  T 

respectively.  Here  (N  ,Y  )  is  the  relevant  statistic  for  estimating  G.l 

c)  Renewal  testing  with  finite  horizon  and  repetitions.  Fix  T  >  0. 

For  each  n  £  1  we  observe  n  independent  renewal  processes  over  [0,T] . 

In  the  notation  of  example  2.4,  let  Ji(i)  and  $Xk(i),k  £  If  denote  the 

renewal  counting  process  and  lifetimes,  respectively,  for  the_^th  renewal 
process,  i  =  l,...,n.  Then  define  the  aggregate  processes  Nn  =  $Nj*,t  £  Of 

and  Yn  =  jYj.t  £  Of  by 


n  w(i;T)  n  it(i  ;T-t) 

N?  =  Z  Z  l(X.(i)  £  t)  and  Y“  =  Z  Z  l(X#+1(i)  ;>  t), 

i*l  £-1  *  1  i=l  £=0 


respectively.  pie-statistic  (Kn,Yn),  which  is  almost  equivalent  to 
aggregating  (N  ,Y;  over  n  independent  realizations,  is  used  to  estimate 
G.l 


(4.0) 


t 


■  J 
0 


with  and  In  defined  analogously  from  (N^,Y^)  and  (Nn,7n), 

respectively.  The  processes  §  ,  B4  and  Sn  are  the  proposed  estimators  of 
the  measure  B  for  observation  scheme  (a),  (b)  and  (c),  respectively,  and 
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are  used  to  construct  product-limit  estimators  of  G. 
Gn  =  £  0$,  GT  =  $Sj,t  *  oj  and  GR  =  Jcj.t  *  0 


Define  the  processes 
by 


(4.1) 


ANn 

Gj  =  1  -  jt  (1  -  ABn)  =  1  -  n  1 - ^ 

1  0<s£t  8  0<s£t  Yn 

s 


with  G  and  G„  defined  analogously  from  B  and  Bn,  respectively.  The 
processes  Gn,  G1  and  Gn  are  the  proposed  product-limit  estimators  of  G 
from  the  observation  schemes  (a),  (b)  and  (c),  respectively.  The  estimator 
Gn  was  first  formally  introduced  to  the  statistical  literature  by  Kaplan  and 
Meier  (1958),  although  its  historical  origins  appear  to  date  earlier  (see 
Gill  (1980)),  whereas  G1  and  Gn  are  natural  extensions  of  Gn. 

For  each  t  such  G(t)  <  1,  lemma  18.8,  Liptser  and  Shiryaev  (1978) 
implies  that 

-n  1  1  -  Gn 

(4'2>  1  -  J  l  -  6(*-)  f<*B(S»<dS"  -  dB(s)) 

0 


where  f(AB(s))  =  (1  -  AB(s))  l(AB(s)  <«1) •  Of_gourse  it  is  possible  to 
write  analogous  expressions  involving  G1  and  G.  It  turns  out  that  these 
expressions  are  the  key  to  proving  the  asymptotic  properties  of  the  product- 
limit  estimators  since  they  either  define  a  martingale  or  can  be  well 
approximated  by  a  martingale  in  probability. 

The  estimators  G1  ,  G  and  G  are  consistent  and  the  normalized 
differences  converge  weakly  to  a  Gaussian  process  of  independent  increments 
as  n,  T  and  n  tend  to  infinity,  respectively.  Essentially^  these 
estimators  inherit  these  properties  from  the  estimators  B,  B1  ajid  Bn  as 
may  be  proved  by  the  methods  of  section  3.  A  detailed  study  of  Gn  is  given 
by  Gill  (1980,  1983)  although  his  proof  of  weak  convergence  relies  on  an 
elaborate  construction  in^heorem  4.2.2,  Gill  (1980).  An  alternative  proof 
of  weak  convergence  for  Gn  is  given  by  Phelan  (1986a)  which  is  based  on  the 
methods  outline  in  section  3  and  does  not  rely  on  any  special  constructions. 
The  problem^f  consistency  and  weak  convergence  for  the  renewal  testing 
estimator  G*  is  considered  by  Phelan  (1986a).  His  model  includes  right 
censoring  of  the  interrenewal  times  a^d  his  method  is  to  show  that  the 
equivalent  expression  to  (4.2)  for  G1  is  well  approximated  by  a  martingale 
in  probability  for  large  T.  Then  the  asymptotic  (i.e.  T  f  ®)  properties 
of  G  are  established  n}  a  manner  consistent  with  that  used  for  Gn. 
Finally,  the  estimator  Gn  is  considered  by  Gill  (1981)  when  G  is 
restricted  to  being  either  purely  discrete  or  continuous.  He  does  not  employ 
martingale  techniques  although  we  believe  the  approximation  methods  of  Phelan 
(1986a)  can  be_modified  for  this  purpose.  „This  would  unify  the  asymptotic 
treatment  of  Gn  with  that  of  Gn  and  G  . 

In  closing  this  subsection  we  recall  the  model  of  example  2.2  for  life¬ 
testing  in  heterogeneous  populations  or  random  environments.  In  the 
proportional  hazards  model  the  random  variable  Z  depends  on  an  unknown 
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parameter  9,  say,  where  inference  for  9  is  also  of  interest.  This  is  a 
problem  in  the  general  theory  of  partial-likelihood  (see  Wong  (1986)).  In 
life  history  analysis  this  problem  has  been  considered  from  the  modern  point 
of  view  using  counting  processes  by  Andersen  and  Gill  (1982)  (see  also 
Prentice  and  Self  (1983)).  These  authors,  of  course,  generalize  the  problem 
to  allow  Z  to  be  a  time-dependent  stochastic  process  depending  on  9. 

4.2.  Estimatine  Markov  transition  probabilities.  For  fixed  T  >  0 
let  P  =  (P(s,t),0  £  t  £  T)  denote  the  transition  probabilities  for  a 
nonhomogeneous  Markov  chain.  Consider  the  problem  of  estimating  P  under 
the  following  observation  scheme.  .For  each  n  £  1  we  observe  n 
independent  Markov  chains  X1  =  $X*  0  £  t  £  T$,  i  =  1 . n  each  with  finite 

state  space  E,  transition  probabilities  P  and  arbitrary  initial 
distribution. 

Let  p  denote  either  counting  measure  or  Lebesque  measure  and  suppose 
P  admits  a  Q-matrix  (cf.  example  2.3))  relative  to  p.  For  each  i,j  G  E 
and  t  £  0  define  B^t)  =  J^q. j(s)p|ds$  and  let  B(t)  =  (B^(t),i,j  6  E). 

We  begin  by  estimating  the  matrix  function  B  =  (B(t),t  ;>  0).  For  i  *  j  G  E 
define  Nn(i,j)  =  |N“(i. j) ,t  *  0|,  Yn(i)  =  |Y^(i),t  *  0$  and  Bn(i,j)  = 

$Bj(i,j),t  £  0*  by 

N"(i,j)  =  E  Z  1(X*  =  j.X*  -  i).  Y?(i)  -  Z  KxJ  =  i)  and 
1  k=l  0<s£t  8  s“  1  k=l 

t 

Bj(i.j)  =  J  (Yj(i))+dN^(i,j) 

0 


and  put  B(i,i)  =  -Z^  J(i, j).  The  matrix  valued  process  Bn  = 

(Bn(i,j),i,j  G  E)  is  the  martingale  estimator  of  the  cumulative  rate  matrix 
B  and  is  used  to  construct  a  product-limit  estimator  of  P.  For 
0  £  s  £  t  £  T  define  the  product-limit  estimator  P  by 

P(s,t)  -IT  (I  +  AB"). 
s<u£t 


If  the  product  is  empty,  then  define  P(s,t)  =  I,  the  identity  matrix.  The 
estimator  P  is  an  empirical  transition  probability  matrix  which  satisfies 
the  Chapman-Kolmogorov  equation  and  is  the  proposed  estimator  for  discrete 
and  continuous-time  Markov  chains. 

For  i  *  j  G  E  define  §  (i, j)  -  |SJ(i,j),0  £  t  £  T|  by 


t 

B"(i,j)  -  J  l(Yg(i)  >  0)q4 j (s)p|ds( 
0 
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and  Bn(i,i)  =  -E.^Bn(i,j).  Let  Bn  *  (B^(i,j),0  £  t  £  T,i,j  6  E)  and 
* 

define  the  process  P  by  the  product  integral 
~  HRn 

P(s *t)  =  n  (I  +  S£-(u)p$duD  ,  o  <;  S  <;  t  <;  T. 
s<u£t  ** 


According  to  theorem  3.1,  Aalen  and  Johanser.  (1978)  the  following  integral 
equation  is  valid 

t 

mJ  =  (P(0,t>r1(0,t)  -  I)  =  j  P(0,s-)(dB^  -  d§|j)P-1(0,s)  ,  0  £  t  <;  T 

0 


where  Mn  =  jM^.O  <,  t  £  is  a  matrix-valued  process  whose  ijth  element  is 
the  sum  of  terms  of  the  form 

t 

J  P.k(0,s-)(dB^(k,m)  -  dB^(k,m)p“j(0,s). 

0 


It  turns  out  that  Mn  is  a  martingale  and  this  fact  is  key  to  proving  the 
asymptotic  properties  of  P  (cf.  equation  (4.2)  for  Gn). 

The  estimator  P  is  uniformly  consistent  over  [0,T]  and  the  normalized 
difference  n'fP  -  P)  converges  weakly  to  a  matrix-valued  Gaussian  process 
of  independent  increments  as  n  «♦  ®.  This  is  proved  by  Aalen  and  Johansen 
(1978)  and  Phelan  (1986b)  in  the  continuous  and  discrete-time  setting, 
respectively.  Their  treatment  is  general  enough  to  allow  for  general  patterns 
of  censoring. 

In  closing  this  subsection  we  pose  the  problem  of  estimating  the 
sojourn-time  distribution  Gj  for  each  i  G  E.  This  is  a  problem  of 

estimating  a  family  of  life-distributions.  In  fact  a  product-limit  estimator 
of  G.  can  be  constructed  from  Bn(i,i)  (see  Aalen  and  Johansen  (1978))  and 

may  be  studied  according  to  the  methods  of  section  4.1. 

5.  DISCUSSiON.  In  this  paper  we  have  surveyed  some  Estimation  problems 
in  life-testing  and  Markov  chain  analysis  involving  Poisson  type  counting 
processes.  Our  discussion  underscores  the  importance  of  martingale  theory 
and  the  product-limit  estimator  in  providing  for  a  unified  theory  and 
methodology. 

Other  inference  problems,  such  as  setting  confidence  bands,  hypothesis 
testing  and  comparison  of  sub-populations,  are  covered  by  some  of  the 
references  cited  herein. 
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A  HIERARCHICAL  MULTISCALE  PROCESSING  OF  IMAGES 


B.  Gidas 
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ABSTRACT 

We  describe  a  new  method  for  Digital  Image  Processing.  It  is  based  on  a 
combination  of  Renormalization  Group  ideas  and  the  Markov  Random  Field  modeling 
of  images.  It  provides  a  unifying  procedure  for  performing  a  hierarchical, 
multiscalc,  coarsc-to-finc  analysis  of  image  processing  tasks  such  as  restoration, 
texture  classification,  coding,  motion  analysis,  etc.  The  method  has  been  tested  by  a 
number  of  computer  experiments.  We  report  here  two  restoration  experiments. 

I.  The  Method. 

Image  processing  problems  (restoration,  segmentation,  texture  classification, 
compression  and  coding,  motion  analysis,  photomosaics,  etc.),  and  Robotics  Vision 
(automatic  object  recognition),  deal  with  cooperative  features  that  exist  and  interact 
on  a  large  number  of  length  scales  -  from  the  microscopic  features  of  texture 
consisting  of  elementary  "grains"  to  the  macroscopic  features  characterizing  large  scale 
objects.  Such  multiscalc.  interdependent  features  appear  in  all  situations  of  practical 
interest:  Images  obtained  from  aircrafts,  various  types  of  satellite  data,  thermal 

(*)  To  appear  in  Proceedings  of  the  Fourth  Army  Conference  on  Applied 

Mathematics  and  Computing,  May  27-30,  1986,  Cornell  University,  Ithaca,  NY. 

(**)  Partially  supported  by  ARO  DAAG-29-83-K-01 16  and  NSF  Grant  DMS 
85-16230. 
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images,  robot  vision  fields,  photon  emission  tomography  and  scans  from  nuclear 
magnetic  resources,  etc. 


Our  method  [3]  processes  images  in  a  multiscalc.  coarsc-to-fine.  hierarchical 
fashion.  It  is  based  on  a  probabilistic  modeling  of  images  (a  Bayesian  approach 
using  Gibbs  distributions  [2]),  and  Renormalization  Group  ideas  from  Statistical 
Physics  and  Quantum  Field  Theory  [6].  The  method  is  highly  parallel  and  efficiently 
implcmcntablc  on  parallel  architectures.  The  procedure  generates  a  (vertical)  cascade 
of  images  from  a  given  image.  The  top  level  of  the  cascade  is  the  original  image, 
while  the  bottom  level  of  the  cascade  contains  only  the  largest  scale  features  of  the 
original  image.  Each  intermediate  level  represents  features  of  length  scale  larger 
than  the  length  scale  of  levels  below  it,  and  smaller  length  scale  than  the  levels 
above  it. 

The  method  consists  of  two  major  stages,  the  Rcnormnlzintion  stage  and  the 
Processing  stage.  The  general  formulation  of  the  method  with  a  number  of  computer 
experiments  can  be  found  in  [3].  Here  we  present  a  simple  form  of  the  procedure 
land  two  restoration  experiments).  We  describe  first  the  Renormalization  stage: 
Given  a  2Nx2N  image  (to  be,  for  example,  restored,  segmented,  or  coded),  we 

construct  a  sequence  of  M<N  images  l/**)  of  size  2^*^x2N‘k,  k  *>  1  ,...,  M  (see  Figure 
1;  here  the  cascade  appears  horizontal  rather  than  vertical).  The  original  image 
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Figure  1:  Cascade  levels 
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has  the  finest  grid  (of  lattice  spacing,  say,  1).  Each  lower  image  L^k\  k  =  1  M, 

has  a  coarse  grid  of  lattice  spacing  2k.  Each  level  L^k)  is  obtained  from  the 
previous  level  L^*1)  by  dividing  L^k*^  into  2x2  disjoint  cells  and  identifying  each 
cell  by  its  center.  The  set  of  these  centers  constitute  the  pixels  of  the  level  L^k\  In 
Figure  1,  the  dotted  squares  of  arc  centered  at  crosses  which  become  the  pixels 

of  The  dotted  squares  of  arc  centered  at  circles  which  become  the  pixels 

of  and  so  on.  One  should  think  of  each  L^k )  k  =»  1  .  M.  as  being  the 

original  image  viewed  from  larger  and  larger  distances. 

Each  image  L^k\  k  =*  0  ,...,  M,  is  associated  with  a  Gibbs  distribution  p(k). 

is  the  prior  or  posterior  distrbution  of  depending  on  whether  L^)  is 

undergraded  or  degraded.  The  distribution  p(®)  is  estimated  from  the  given  data 
(and  the  degradation  characteristics,  if  the  data  arc  degraded).  The  distribution  P^) 
is  obtained  from  P^0)  via  a  Renormalization  Group  transformation  R.  Similarly,  P^2) 

is  obtained  from  P^1)  via  R,  and  so  on.  At  each  level  k  =  1  .  M,  the  image 

together  with  the  renormalization  group  transformation  R  preserve  all  the  information 
contained  in  the  original  image  L^), 

The  renormalziation  group  transformation  R  is  specified  in  terms  of  certain 
conditional  probabilities  Q  as  follows:  Consider  the  it"-ccll  of  level  k-1  (see  Figure 
2).  Let  xj  1),  xp\  xP\  x|^\  be  the  gray  levels  at  the  four  pixels  of  the  i^-ccll 


«  xi 


t*  K  * 

Figure  2:  The  i  -cell  of  (k-1) -level 
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of  L^k"  (if  the  original  image  is  degraded,  the  xj’s  arc  unknown).  The  center  of 
the  i^-ccll  (denoted  by  a  cross  in  Figure  2),  wilj  become  the  i1*1  pixel  o(  the  k-lcvcl. 
The  gray  level  xj  of  this  pixel  in  L^k)  is  chosen  randomly  from  a  conditional 
probability  Q(xj  |x(^  x{4)).  An  example  of  such  a  conditional  probability  is 

explpxjUJ1  >  +...+  xj4b] 

(x(l)  ,f,  xX4), exptpx/cxO >  x<<>)] 


where  p  is  an  arbitrary  parameter.  If  the  gray  levels  arc  binary,  i.e.,  xj  =  ±1,  then 
taking  p-»+*  in  (1),  we  obtain  the  "majority  rule":  If  the  majority  of  the  x[°^'s  is  ±1, 
then  xj  is  ±1,  respectively.  If  there  is  tic  among  the  xja)’s,  then  xj  is  chosen  +1 
with  probability  )i  Let  x  «*  {xj  :  i  €  L(k‘^)  and  x'  =  {xj  :  i  €  L^},  be  the 
gray  level  configurations  of  L^'1)  and  L^k),  respectively.  Then  the  Renormalization 
Group  transformation  R  is  defined  by 


where 


P  (k)  (x')  =  RPk*  1  =  E  R(x '  |x)P^k‘,^(x) 

(x) 


(2) 


R(x*/x)  -  ”  Q(x*  |X(  O .  x(4)}  (3) 

i€L<k>  1 

I 

If  the  gray  levels  xj  arc  continuous,  the  sums  in  (1)  and  (2)  should  br  replaced  by 
integrals. 

In  general,  there  is  [3]  a  freedom  in  choosing  the  cells  and  tl  c  conditional 
probabilities  Q  (i.e.,  the  cells  need  not  be  squares,  and  Q  need  not  be  taken  of  the 
form  (1)).  Any  a  priori  knowledge  about  can  be  accommodated  in  the 
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modeling  of  L^,  as  well  as  in  the  choice  of  the  cells  and  the  conditional 

■  * 

probabilities  Q. 

The  above  stage  of  constructing  the  renormalized  Gibbs  distributions 
k  «  1  M,  from  is  the  Renormalization  staee  of  our  procedure  (in  this  stage 

we  start  from  the  top  level  l/®)  of  the  cascade  and  proceed  towards  the  bottom  level 
l/M)  0f  tjjC  cascade).  Next  comes  the  Processing  stage  of  our  procedure.  In  imaging 
tasks  such  as  restoration,  texture  classification,  coding,  etc.,  we  start  our  processing 
from  the  bottom  of  the  cascade.  That  is,  we  first  process  the  coarscst-grid  image 
L<M>  (which  contains  large  scale  features  only,  and  has  very  few  degrees  of  freedom). 
Then  we  go  upwards.  We  transmit  the  processed  information  from  level  M  to  level 
M-l,  and  process  the  new  (smaller  scale)  features  which  appear  in  L^-l)  but  Mi.  in 
L<m>.  We  continue  the  process  until  we  reach  i/®),  and  thus  process  all  the  fine 
details  of 

During  the  kth  step  of  the  processing  stage  (i.C.,  in  going  from  to  level 

l/*5'1))  the  number  of  possible  intensity  images  at  the  (k-l)-lcvcl  constrained  bv  the 
processed  information  at  the  k-lcvcl  is  much  smaller  than  the  number  of  all  intensity 
images  at  (k-l)-lcvcl  without  any  constraint.  This  reduces  drastcally  the  number  of 
computational  steps  needed  to  determine  the  (k-l)-lcvcf.  This  multiscalc,  coarsc-to-finc 
processing  of  images,  results  to  a  rapid  convergence  and  reduction  of  the 

computational  cost. 

The  present  approach  to  image  processing  problems  is  remincscent  to  the 

pyramid  structures  [1]  and  to  the  multi-grid  method  in  partial  differential 

equations  [4,5].  However,  our  procedure  is  fundamentally  different  from  these 

schemes,  as  arc  its  most  important  properties. 

In  <rcstoration  problems,  we  often  combine  the  above  procedure  with  the 
annealing  algorithm:  The  posterior  distribution  of  depends  on  the 
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temperature  T,  i.c.,  P^  »  \  Here  the  T-depcrtdcnce  enters  through  j_rf,  whefe 

H  is  the  underlying  energy  function.  The  T-dcpendcnce  of  the  subsequent 

(renormalized)  distributions  P^k)  is  more  complicated.  The  bottom  level  M  of  the 

cascade  (which  has  very  few  degrees  of  freedom)  can  be  restored  by  applying  the 

•  • 

.  ,  *. 

annealing  algorithm  to  P^^  (quite  often,  however,  this  level  can  be  restored  by  a 
simple  determination  of  the  lowest  energy).  Having  restored  a  level  k 
(k  =  M,  M-l  ,...,  1),  we  restore  the  (k-l)-lcvcl,  by  applying  the  annealing  algorithm  to 
the  conditional  probability 

P^’^(x(k-1)) 

PT(x<k-»jxOO,  m  R(x<k)|x<k-I>) — - — — — -  (4) 

1  -  1  .  PW(*<k>)  • 

*  »  .  , 

where  x<k>  denotes  the  gray  intensities  of  the  k-lcvel  (already  restored),  and 
x(k*»)  the  gray  intensities  of  the  (k-l)-lcvel  (to  be  restored). 

At  each  level  of  the  restoration  stage  (i.c.,  in  going  from  L.M  to  L(k-\ 

we 

choose  an  annealing  schcdolc  of  the  form 

T0 

T(t)  =  - ,  t  -  1,2 .  (5) 

•  1  +.log  t  .  .  . 

The  initial  temperature  T0  need  not  be  the  same  at  all  cascade  levels.  In  fact, 
choosing  Tq  to  increase  as  we  move  from  the  bottom  of  the  cascasc  (coarse  grid) 
toward  the  top  of  the  cascade  (fine  grid),  the  algorithm  is  somewhat  faster.  In  our 
experiments  (Section  II),  we  chose  Tq  to  be  the  same  at  all  cascade  levels. 
However,  this  Tq  is  Ih  general  smaller  than  the  Tq  needed  for  a  direct  anrfealing 
of  the  fine  grid  level  only.  There  is  a  theoretical  justification  of  this  fact: 

the  renormalization  group  ''trajectories"  (6J  move  towards  the  trivial  zero-temperature 
"fixed  point"  as  T(t)  -*0.  Also,  the  overall  convergence  of  the  present  procedure  is 
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(in  general)  faster  than  the  convergence  of  annealing  applied  directly  to  the  fine  grid 
distribution  P^t)  • 

II.  Experiments 

The  Image  processing  method  outlined  in  Section  I,  has  been  tested  by  a 
number  of  computer  experiments  [3].  Figures  3  and  4  show  two  restoration 

experiments. 

Ei&ur.g-l 

The  original  signal  (a)  is  a  binary,  "handrawn"  signal  with  1025  pixels.  The 
degraded  data  (b)  were  obtained  by  adding  a  Gaussian  noise  of  mean  zero  and 
variance  o2  =  1.  (c 7  -  Cq)  represent  eight  restoration  levels.  Notice  that  the  small 
pieces  at  the  center  and  end  of  the  original  signal,  do  not  appear  until  level  (C3). 
These  pieces  have  a  length  scale  smaller  than  the  length  scale  of  the  "features" 

contained  in  (c 7)  -  (c4). 

In  this  example,  equation  (2)  can  be  solved  exactly.  The  resulting  algorithm  is 
deterministic  (i.c.,  no  annealing  or  stochastic  relaxation  is  needed),  and  extremely 
efficient. 

Figure  4 

The  original  image  (a)  is  a  64  x  64  binary  image.  It  was  generated  by  the 
"spin-flip"  algorithm.  The  degraded  data  (b)  was  generated  by  adding  a  Gaussian 
noise  of  mean  zero  and  variance  a2  »  5.  (C2)  *  (cq)  represent  three  restoration 

levels:  (C2)  16  x  16  ,  (cj)  32  x  32  ,  and  (cq)  64  x  64  .  At  each  cascade  level  we 

used  the  annealing  algorithm  (applied  to  (4))  with  an  initial  temperature  Tq  =  1.5, 
and  performed  five  sweeps  per  level.  For  comparison,  we  restored  the  degraded 
image  (b)  by  applying  the  annealing  algorithm  directly  at  the  top  level  (64  x  64). 
With  an  initial  temperature  Tq  =  3  and  100  sweeps,  the  result  of  the  annealing  was 
not  as  good  as  the  result  of  our  procedure. 
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The  final  restoration  (cq)  of  the  present  procedure  although  satisfactory, 
contains  some  noise  at  the  boundaries  of  the  various  regions.  This  noise  could  be 
eliminated  by  using  (cq)  as  the  initial  configuration  of  a  deterministic  descent 
algorithm.  Since  (cq)  is  very  near  to  the  true  "global  minimum",  any  deterministic 
descent  algorithm  starting  (c q)  would  reach  the  true  global  minimum  in  a  small 
number  of  iterations. 
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Abstract  We  consider  the  maximum  entropy  method  of  expert  system  construc¬ 
tion.  We  show  that  the  construction  of  the  expert  system  is  equivalent  to  the  minimization 
of  a  convex  function  in  as  many  dimensions  as  there  were  pieces  of  knowledge  supplied 
the  system.  We  show  that  in  the  case  where  the  knowledge  presented  the  system  is  self¬ 
contradictory,  the  minimization  of  this  function  creates  an  expert  system  for  a  set  of  con¬ 
straints  that  is  consistent  and  ‘close’  to  the  original  inconsistent  constraints.  Monte  Carlo 
methods  for  minimizing  the  function  are  discussed,  and  illustrated  by  computer  experi¬ 
ment.  One  of  the  examples  given  suggests  an  approach  to  the  problem  of  invariant  optical 
character  recognition. 

Introduction  An  expert  system  is  designed  to  answer  questions.  We  consider  prob¬ 
abilistic  expert  systems  —  if  the  system  is  given  an  event,  it  should  be  able  to  calculate 
its  probability.  Such  an  expert  system  is  actually  a  distribution  on  the  set  of  all  events 
we  wish  to  consider.  Typically  the  knowledge  the  system  is  based  on  will  be  insufficient  to 
answer  all  questions.  In  many  cases  we  wish  to  consider,  the  sheer  size  of  the  state  space 
precludes  such  knowledge.  A  medical  expert  system  could  be  ask^d  for  the  probability  of 
a  disease  given  some  combination  of  symptoms,  yet  the  set  of  all  possible  combinations  of 
symptoms  is  huge,  and  the  knowledge  base  can  not  be  expected  to  contain  all  the  different 
probabilities.  We  desire  our  system  to  answer  questions  even  in  such  cases,  and  to  do  so 
in  a  reasonable  manner,  much  like  a  human  expert  would.  For  this  purp-'se  we  consider 
‘the  principle  of  maximum  entropy’.  Of  all  the  distributions  which  satisfy  the  knowledge 
supplied  the  system,  we  will  pick  the  one  with  maximum  entropy  to  be  our  expert  system. 
The  entropy  H  of  a  distribution  p  is  defined  as 

h[p)  =  ~  Yi  pMlogpH 

w€f) 

where  u ;  is  an  event,  Cl  is  the  set  of  all  events,  and  p(w)  is  the  probability  of  the  event 
u>.  Entropy  has  an  information-theoretic  meaning;  the  distribution  with  maximum  entropy- 
can  be  viewed  as  the  one  containing  the  least  knowledge.  By  maximizing  the  entropy  over 
all  distributions  that  agree  with  the  knowledge  base,  we  are  picking  as  our  expert  system 
the  distribution  that  makes  the  fewest  unnecessary  ‘assumptions’.  For  more  information 
regarding  the  justification  of  the  principle  of  maximum  entropy,  we  refer  the  reader  to  [1]. 

1.  Knowledge  The  construction  of  a  probabilistic  expert  system  begins  with  knowl¬ 
edge.  We  classify  as  knowledge  anything  that  answers  probabilistic  questions;  we  think  of 
a  probabilistic  question  as  a  function  of  probabilities  and  we  consider  an  answer  to  be  the 
value  we  say  the  function  will  take  on.  (This  brings  up  a  more  general  way  to  view  knowl¬ 
edge;  we  could  view  an  answer  as  specifying  that  the  function,  that  defines  the  question, 
has  a  value  in  a  certain  range.  We  will  not  be  using  this  type  of  answer.)  Using  these  ideas, 
we  see  that  the  knowledge  supplied  the  system  can  be  broken  up  into  distinct  ‘pieces’  of 
knowledge,  each  of  which  corresponds  to  a  distinct  probabilistic  question  and  its  answer. 
Each  piece  of  knowledge  is  a  constraint  that  must  be  satisfied  by  the  expert  system;  if  the 
answer  to  the  question  we  ask  the  system  is  included  in  the  knowledge  base,  then  the  expert 
system’s  answer  is  constrained  to  duplicate  it.  We  can  write  a  constraint  in  its  most  general 
form  as 


We  consider  a  restriction  of  the  above  to  the  case  where  B  is  a  linear  function  of  p,  our 
constraint  can  thus  be  written  as 

52  *(«)*(«)  =  c 

w€0 

The  above  constraint  is  the  same  as  specifying  that  the  expected  value  of  b  is  c.  Since 
c  =  P(w)>  we  cons*der  the  function  a,  where  o(u;)  =  fc(w)  -  cp(u>),  and  we  re-write 

the  above  constraint  as 

52  =  o 

wen 

It  may  seem  that  this  form  is  very  restricted,  but  it  is  sufficient  for  several  important  types 
of  constraints  ([3], [4]).  It  is  capable  of  representing  any  piece  of  knowledge  that  can  be  put 
in  terms  of  the  expected  value  of  a  function;  it  can  thus  represent  knowledge  about  marginal, 
joint  and  conditional  probabilities.  To  illustrate  this  consider  the  following  example: 

p{u>  E  Sjjw  E  52)  —  .5 

Using  Bayes’s  rule,  we  can  re-write  the  above  as 

p(<jj  £  Si  n  S2 ) 
p(w  €  S2) 


This  can  be  written  as 
which  is  the  same  as 


p(uj  €  5i  fl  S2 )  -  ,5p(oj  E  Sj)  =  0 
52  yXSxnS2  (w)  -  .5I5j  {u)J  p(w)  =  0 , 


wen 


where  Xs  denotes  the  indicator  function  on  the  set  of  events  in  S;  when  u)  E  S  we  will  have 
Xs{ w)  =  1,  when  t  f  S  we  will  have  Xs{v)  =  0. 


1.  Lagrange  Multipliers  Recall  our  goal.  We  wish  to  find  a  distribution  that  satisfies 
a  set  of  constraints,  and  has  higher  entropy  than  any  other  such  distribution.  Using  the 
form  for  knowledge  that  we  introduced  in  the  previous  section,  we  can  state  the  problem 
as  follows: 

max  (~  £ 

V  wen  ' 

over  all  p  satisfying  (1) 

52  “tVM")  =  0  i'=l,  ...m 

wen 

52  pH  =  1 

w€0 

p(u>)  >0  w  €  n 


With  suitable  care,  we  can  use  Lagrange  multipliers  to  reduce  the  above,  constrained, 
problem  to  an  unconstrained  problem.  In  order  to  apply  the  theory  of  Lagrange  multipliers 
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we  must  add  some  assumptions.  For  the  complete  details,  we  refer  the  reader  to  [2};  here 
we  will  note  that  the  required  assumptions  are  that 

{a,(')}  are  linearly  independent  vectors- 

3p(w)  such  that  ^  a,(w)p(u;)  =  OV »  and  p(u/)  >  OVwe  ft  W 

w€0 

The  Lagrangian  is 

J^p(u;)logp(w)  +  EA'(E  )/>(“>))  + 

u»en  1=1  'u€fj  '  wen  ' 

We  know  from  Lagrange  multiplier  theory  that  there  exist  A  =  {Alt...,Am}  and  l  such 
that  the  derivative  of  the  Lagrangian  (with  respect  to  A,,  6  and  p( u.- ))  at  A ,1  is  zero,  and 
that  such  A,  6  define  local  extrema.  Performing  some  algebraic  manipulations,  we  arrive  at 
the  following  equations 


p(w)  =  expT-  ^2 A<a»(w))  /  E  exp(~  E  A«a»({i))  Vw  e  n 

'  i=i  '  ’  wen  '  i=i  ' 

E  expf-  E  A<*(«)) 

wen  "  i=i  ' 


d 

d\k 


wen  '  i=i 

—  0  k  =  1, . . . ,  m 


(3) 


The  function  exp(-  A,a,(ai))  will  be  denoted  Z( A),  where  A  =  {Aj, . . . ,  Am} .  The 

function  Z  is  sometimes  called  the  partition  function. 

Notice  that  Z  is  a  convex  function.  Hence  there  is  at  most  one  A,  corresponding  to 
the  global  minimum  of  Z ,  at  which  Z  has  an  extremal  point  (:.e.,  r  =  OVi). 

Under  the  assumptions  (2)  we  know  that  such  a  A  must  exist,  hence  the  maximum  entropy 
distribution  exists  and  is  unique.  A  interesting  property  of  Z  is  that  when  the  assumptions 
(2)  do  not  hold,  Z  has  no  extremal  point  (see  [2]  for  the  details).  Hence,  if  we  try  to  minimize 
Z  and  succeed,  we  have  found  the  maximum  entropy  distribution  (since  the  maximum 
entropy  distribution  is  defined,  through  (3),  by  the  A  at  which  the  minimum  occurs).  Our 
computational  goal  (section  4)  will  thus  be  to  minimize  Z.  We  note  that  there  have  been 
many  ideas  and  methods  for  the  computation  of  the  maximum  entropy  distribution,  some 
involving  Lagrange  multipliers,  others  not;  some  examples  are  [l  ,[5  -[7] .  The  method  we 
use  is  based  on  work  by  Geman[3j  and  Geman[4]. 


3.  Contradictions  Let  us  consider  the  case  where  the  assumptions  (2)  do  not  hold. 
We  will  still  assume  that  the  a,'  are  linearly  independent,  a,  will  usually  be  a  simple  function 
of  u  (for  example  a,  is  often  a  combination  of  indicator  functions),  in  such  cases  indepen¬ 
dence  is  relatively  easy  to  verify.  If  the  constraints  are  dependent,  some  can  be  removed  so 
as  to  provide  independence.  Hence,  the  restriction  that  the  a,  be  independent  is  often  easy 
to  satisfy. 

More  hazardous  is  the  assumption  that 

3p(w)  such  that  ^  o,(u/)p(w)  =  OV  t  and  p(w)>0Vu;€n 
w€n 

This  assumption  can  fail  in  two  fundamentally  different  ways.  The  first  occurs  when  there 
exists  distributions  p  that  satisfy  the  constraints  (so  <*i(w)p(w)  =  0),  but  all  such 


229 


distributions  assign  probability  zero  to  some  events.  The  other  way  this  assumption  can 
fail  is  when  there  exists  no  p  that  satisfies  the  constraints.  This  is  the  case  when  the 
constraints  are  self-contradictory.  In  both  the  above  situations  we  can  show  that  trying 
to  minimize  the  partition  function  (i.c.,  driving  the  gradient  of  Z  close  to  zero)  and  using 
the  form  for  the  probabilities  found  by  the  use  of  Lagrange  multipliers  (3),  does  something 
useful. 

The  original  constrain  ts  we  supplied  the  system  with  were 

Y  =  0  i  =  l, ... ,m  (4) 

wen 

Consider  the  system  of  constraints 

Y  OiMpH  =  «.  *  =  1, . . . ,  m  (5) 

wen 

When  ||c||  =  (c\  +  . .  .e£,P  is  small  enough,  we  would  expect  the  two  systems  of  constraints 
to  be  interchangeable.  Now,  let  p\  be  defined  as  follows 


pA(w)  =  exp^-  ^A,a,(w)^  / Z{ A) 

'  i=i  '  ' 


where  Z  is  defined  with  respect  to  the  constraints  (4).  If  Z  has  an  extrema  at  A  then  jk  is 
the  maximum  entropy  distribution  for  the  constraints  (4).  For  any  A,  we  can  show  (see  [2]) 
that  px  is  the  maximum  entropy  distribution  for  the  system  of  constraints 

Y  “.HpM  =  Y  «•■(") «xp(”  EAi'Oi'H)  / Z(A)  =  ~V«Z(A)/ Z(A) 

wen  wen  '  <=j  '  ’  ' 


where  V,-  is  the  tth  component  of  the  gradient.  So,  if,  for  a  given  A,  ||VZ(A)/Z(A)||  is  small 
(corresponding  to  ||e||  being  small  in  (5)),  then  px  is  the  maximum  entropy  distribution  for 
a  system  of  constraints  that  is  close  to  the  original  system  of  constraints. 

Hence,  our  desire  is  to  find  a  A  such  that  ||VZ(A)/Z(A)||  is  small.  In  light  of  this,  let 
us  examine  the  cases  where  the  assumptions  (2)  do  not  hold.  When  a  system  of  constraints 
has  as  its  only  solutions  distributions  p  that  assign  probability  zero  to  some  events,  we 
can  show  (see  [2])  that  Z(A)  is  bounded  below  by  1.  Hence,  all  we  need  to  do  is  make 
the  gradient  of  Z  arbitrarily  small,  and  we  will  have  found  a  A  that  defines  a  maximum 
entropy  distribution  which  satisfies  constraints  arbitrarily  close  to  those  originally  supplied. 
When  the  constraints  are  contradictory,  we  can  show  (see  [2])  that  when  VZ  goes  to  zero,  Z 
will  also.  But,  we  can  also  show  (see  [2])  that  using  a  continuous  gradient  descent  method 
(define  A (t)  by  the  O.D.E.  d/dtXi(t)  =  -V,Z(A(f))/||VZ(A(<))|| ,  with  the  initial  condition 
A,(0)  =  0V  t)  to  minimize  Z  yields  a  path  A(t)  such  that  ||VZ(A(t))/Z(A(f))[|  decreases  as 
t  increases.  In  this  sense,  we  get  a  maximum  entropy  distribution  for  a  consistent  set  of 
constraints  that  approximates  the  ’it  insistent  set. 

4.  Minimizing  the  Partition  Function  In  this  section  we  consider  the  computational 
side  of  finding  a  maximum  entropy  distribution.  Recall  that  finding  the  maximum  entropy 
distribution  is  equivalent  to  minimizing  a  convex  function,  the  partition  function  Z(A),  as  we 
showed  in  section  2.  Recall  also  that  Z(A)  and  VZ(A)  are  defined  by  sums  over  all  elements 
in  fl.  When  fi  has  a  small  number  of  elements,  computation  is  simple.  The  gradient  of  Z 


230 


can  be  calculated  exactly,  and  Z  minimized  by  gradient  descent.  However,  even  in  a  small 
letter  recognition  problem,  for  example,  we  may  have  our  letters  described  by  ten  features, 
each  feature  being  able  to  take  on  thirty  different  values.  This  yields  a  state  space  with 
3010  elements,  and  a  sum  of  3010  terms  (each  of  which  involves  the  exponential  of  a  sum  of 
m  terms),  as  would  be  necessary  to  explicitly  compute  VZ,  is  beyond  the  practical  limits 
of  computation.  This  difficulty  can  be  overcome  by  estimating  the  direction  ofVZ,  instead 
of  calculating  it  exactly.  Before  we  delve  too  deeply  (for  more  details  see  [2]),  let  us  first 
outline  the  general  idea.  The  crucial  observation  is  that  we  can  find  a  distribution  p\ ,  such 
that  VZ(A)/Z(A)  is  just  an  expected  value  (with  respect  to  the  distribution  p\)  of  some 
simple  function.  Notice  that  VZ(A)/Z(A)  supplies  us  with  both  the  direction  of  the  gradient 
(so  we  can  minimize  Z  by  gradient-descent  type  methods)  and  also  tells  us  how  close  we 
are  to  satisfying  our  constraints  (see  section  3).  Since  by  using  Monte  Carlo  type  methods 
we  can  simulate  such  a  distribution  p> ,  and  since  the  sample  mean  from  a  simulation  is 
close  to  the  real  expected  value,  we  can  actually  approximate  VZ(A)/Z(A)  without  doing 
a  size  of  fl  number  of  calculations.  The  idea  of  using  sampling  to  find  the  direction  of  the 
gradient  of  Z,  was  first  proposed  by  Geman  [3]. 

Consider  a  distribution  p\  on  the  space  f 1  where  the  probability  of  the  event  gu,  p\{u) 
is  defined,  as  before,  as 


/  v  _  exp(- £”i 

Eugn  exP(“  I-i=i  fl»(w)A<) 


The  expected  value  of  the  function  f(u>)  with  respect  to  the  distribution  px  is 


E\(f)  = 

wen 


£wen/HexP(- 

wen  exp(*"  fli(w)A,) 


and  for  the  function  —  a,(w)  we  have 

p  «((<«)  «p(-  ££i  °.(w)A()  V,£( A) 

'  0,1  m 

Since  all  a  gradient  descent  method  needs  is  the  direction  of  the  gradient,  we  can  use  the 
above.  Also  note  that  a  measure  of  how  close  we  are,  at  a  certain  A ,  to  satisfying  the  original 
constraints  is  just  ||£a(o,)||  =  ||VZ(A)/Z(A)|  (see  section  3). 

Now  that  we  have  VZ/Z  in  terms  of  an  expected  value  we  come  to  the  problem  of 
simulation.  The  goal  is  to  find  an  ergodic  sequence  u>'  with  marginal  distribution  px .  In 
this  way  the  sample  expected  value  of  /(w)  using  S  samples  is 

jZW  (6) 

i 

and  for  S  large  this  should  be  close  to  the  true  value  of  Ex  (/). 

The  method  we  use  to  find  an  ergodic  sequence  requires  that  fl  have  some  sort  of 
neighborhood  structure,  we  will  thus  revise  our  view  of  the  state  space  Q.  For  the  sake 
of  clarity  we  will  consider  each  event  in  fl  as  the  state  of  a  1-dimensional  lattice  with  N\ 
elements.  An  element  w  in  fl  will  thus  be  of  the  form  u  =  {u\, . . .  .  Furthermore, 

each  component  of  w,  w*  will  be  restricted  to  N$  different  values  (fl  will  thus  have  n^j 
elements).  We  note  that  with  a  lattice  structure  fl  can  get  very  large,  without  much  effort. 
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Our  ergodic  sequence  will  start  at  a  random  point  w1  in  fl.  We  will  then  pick  u>,+1 
given  uj1  as  follows;  we  will  fix  most  of  the  components  of  w,+  1  to  be  the  same  as  the  value 
for  tiA  The  values  we  don’t  fix  will  allow  w,+1  to  be  any  element  in  some  subset  of  fl,  call 
it  T,  containing  r  elements,  {ti,  We  will  then  randomly  pick  am  element  from  T, 

according  to  the  probabilities  Pa(^),  to  be  u>l+1 .  This  is  done  as  follows:  we  first  calculate 
7(A)*  for  every  t,  in  T,  then  we  randomly  (according  to  a  uniform  distribution)  pick 
a  number  between  0  and  7(A)-  ?>(<;)•  This  randomly  chosen  number  will  be  between 

7(A)-  pa((j)  and  7(A)-  p.x(fj+i)  for  some  in  T,  and  we  will  let  uil+l  =  tj.  Note  that 
7(A)-  px(tt)  equals  exp^T.vj (~ o,-(t,-)A,-),  which  just  requires  an  order  of  m  operations  to 
calculate  Since  we  usually  have  m  less  than  several  hundred,  we  are  in  good  computational 
shape.  Of  course,  one  has  to  be  carefull  when  picking  T  at  each  step  l  ( i.e .,  decide  which 
components  of  uil  to  fix)  in  order  to  avoid  creating  numerical  artifacts.  This  is  not  too 
difficult;  one  approach  is  to  fix  components  in  a  random  order  and  with  equal  likelihood. 
This  method  of  finding  an  ergodic  sequence  is  known  as  Stochastic  Relaxation  [3,  and  is 
closely  related  to  the  Metropolis  Algorithm  [81. 

Now  let  us  say  a  word  about  the  minimization  of  7,  given  that  we  have  estimates 
for  the  gradient.  We  note  that  finding  the  gradient  is  still  a  computationally  difficult  task, 
and  hence  we  desire  to  use  a  method  that  requires  the  direction  of  the  gradient  at  as  few  a 
number  of  points  as  possible.  A  discrete  analog  of  the  continuous  gradient  descent  scheme, 
suggested  in  section  3  for  handling  contradictions,  would  prove  too  costly;  w-e  will  therefore 
assume  that  the  constraints  are  not  self-contradictory,  so  any  method  that  drives  V7  to 
zero  will  be  acceptable. 

We  implement  a  modification  of  the  standard  gradient  descent  method.  Typically, 
gradient  descent  refers  to  constant  small  steps  in  the  direction  opposite  to  the  gradient. 
Instead,  to  minimize  the  number  of  times  we  need  to  compute  the  gradient,  we  employ  a 
slight  modification  We  will  still  move  in  the  direction  opposite  to  the  gradient,  but  the 
size  of  the  step  we  take  will  not  be  constant.  When  we  begin  we  will  pick  a  value  for  our 
step-size  6  (positive).  We  will  always  start  at  A0  =  0,  since  this  corresponds  to  the  uniform 
distribution  on  H,  a  logical  starting  point.  At  the  point  A'  we  will  find  a  A’^1  such  that 
A'"1"1  =  A1  -  |V7(A'),  7(A')  where  8  is  picked  as  follows.  Since  7(A’  -  (f>V7(A')/7(A’))  is  a 
convex  function  of  8  its  derivative  (with  respect  to  6)  can  only  be  zero  for  at  most  one  8, 
which  we  shall  call  8 .  If  8  does  not  exist,  then  7(A'-  <5V7(A')/7(A'))  is  a  decreasing  function 
oftf,  and  since  7(A)  is  bounded  below  by  zero,  w-e  would  have  V 7 (A*  —  £V7(A’)/7(A'))  going 
to  zero;  so  fori  large  enough  A'  —  8V 7(A,  /7(A‘)  would  define  an  adequate  solution  (section 
3).  When  8  does  exist,  we  see  that  the  derivative  of  7(A' -  6V7(A,)/7(A’))  (with  respect 
to  8)  is  negative  for  all  8  greater  than  zero  and  less  than  8.  Hence,  picking  6  between  0 
and  8  would  yield  7(A,+1)  less  than  7(A').  However,  the  closer  6  is  to  6  the  smaller  7(A'+1) 
will  be.  We  will  pick  6  between  6  and  8/e  (e  around  2)  by  doing  a  binary  search:  if  the  dot 
product  V 7 (A')  •  V7(A' -  6V7(A')/7(A’))  (remember  that  7  is  positive,  so  the  sign  of  this 
term  is  computable  even  without  normalizing)  is  negative  we  try  6  —  8/e,  if  positive  we  try 
6  =  8e.  When  V7(A')- V7(A'  -  6V 7(A')/7(A'))  switches  sign  from  the  last  8  to  the  current 
8,  we  will  have  completed  our  search  in  the  direction  V7(A');  we  will  define  A,+1  using  the 
8  (choosing  from  either  the  current  or  the  last)  for  which  the  sign  was  positive.  In  this 
manner  we  will  be  sure  that  7(A,+  1)  <  7(A').  Making  e  smaller  (close  to,  but  above,  one) 
yields  higher  accuracy,  but  since  our  gradients  are  not  exact,  and  since  we  would  need  to 
find  many  more  gradients,  a  computationally  expensive  task,  it  is  not  worth  it.  We  save  the 
value  of  8  that  we  used  last,  for  the  next  step,  since  it  is  usually  of  the  correct  magnitude. 

A  useful  feature  of  the  above  method  is  that  it  provides  a  means  to  test  our  sampling 
method.  As  we  increase  6,  the  dot  product  V7(A')-  V7( A'  -  6V7(A‘)/7(A’))  should  be  a 
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decreasing  function.  Likewise  as  we  decrease  6  it  should  be  an  increasing  function  (keeping 
6  positive).  If  this  is  true  for  the  sampled  values  of  the  dot  product,  we  can  have  more 
confidence  in  the  sampling  method.  If  it  is  not,  we  know  that  our  estimate  of  the  gradient  is 
wrong,  in  which  case  we  can  take  appropriate  action.  We  can  increase  the  number  of  samples 
we  are  averaging  over  (5  in  (6)),  we  can  start  at  multiple  begining  points  (w1,#1, . . .)  and 
then  average  over  the  different  trials  ({w1, .  ...}),  we  can  discard  the  first 

n  elements  of  the  series  to  get  rid  of  the  effects  of  the  random  starting  point,  etc...  There 
are  many  things  that  can,  at  the  expense  of  increased  computation,  be  done  to  improve  the 
accuracy  of  the  sampling  method. 

The  next  section  is  composed  of  two  examples.  The  first  is  a  test  of  our  simulation 
methods.  We  construct  a  distribution  and  extract  statistics.  We  then  find  the  maximum 
entropy  distribution.  We  then  calculate  V Z/Z  exactly,  and  see  that  the  simulation  was 
successful  (since  the  values  for  V  Z/Z  are  quite  small).  The  state  space  for  this  example  is 
of  size  i24 ,  so  the  exact  calculation  of  V  Z  was  quite  lengthy. 

The  second  example  we  consider  is  the  problem  of  letter  recognition.  Sample  letters 
were  presented  and  features  extracted  from  them.  Statistics  of  the  features  conditioned  on 
the  letter  served  as  our  knowledge.  The  maximum  entropy  distribution  was  found  and  used 
to  identify  the  sample  letters.  Considering  the  primitiveness  of  the  features  the  results  are 
encouraging. 

5.  Results 

Example  1:  A  test  of  our  method 

In  this  section  we  conduct  a  test  of  our  simulation  methods.  We  will  consider  a 
distribution  on  a  large  state  space  and  use  the  statistics  generated  by  the  distribution  to 
form  constraints.  We  use  sampling  methods  to  conduct  the  gradient  descent  (section  4),  and 
find  a  point  A  that  will  serve  as  our  guess  for  the  extremal  point  of  the  partition  function. 
We  then  compute  VZ(A)/Z(A)  exactly.  This  will  serve  to  tell  us  how  the  statistics  generated 
by  the  distribution  generated  by  A  differ  from  the  statistics  of  the  original  distribution.  We 
will  present  (on  the  following  pages)  the  statistics  of  the  original  distribution,  the  estimated 
value  ofVZ(A)/Z(A)  and  the  true  value  of  VZ(A)/Z(A). 

We  will  have  as  our  state  space,  fi,  the  set  of  all  strings  of  length  24  composed  of  l’s 
and  -l’s,  so  fl  has  224  elements.  We  picked  this  state  space  so  that  the  exact  calculation 
of  VZ  and  Z  is  possible,  although  quite  lengthy.  The  distribution  p  we  use  to  generate 
statistics  is 

,,  ,  =  «p(-  Eg,  SL-'.w'ft  -  Eg,  r(.y<) 

«p(-  I'M*) 

Where  W(i,j)  was  picked  randomly  to  be  either  -)-c  or  -c,  and  T( t)  was  picked  randomly 
to  be  either  +d  or  -  d.  The  values  of  d  and  c  were  chosen  so  that  the  distribution  p  is 
neither  too  flat  nor  too  sharp.  We  used  c  =  1/5  and  d  =  1/2.  Our  constraints  are  the 
expected  values  of  w,  and  ufxij  with  respect  to  p.  Our  constraints  are  thus 

E(u>i)  -  ^  wtp(w)  =  0  for  all  i 
wen 

E(u!tujj)  -  ^  u>iUjp(uj)  =  0  for  all  i,j  witht  >  ; 
wen 

where  E(v,j  and  E(ui,ixij),  the  expected  values  with  respect  to  p,  were  computed  exactly. 
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We  can  write  the  partition  function,  Z( A)  (where  A  =  {Ai, . .  .A300})  as 

24  24  24 

m = E  ex  p(EEai  i-l)(25-(i/2))+A-i+l  {uiVk  ~  E(u,Wk))  ~f  ^  A;76-f,(u;,  -  E(w,))) 

ui€fi  t=  1  k—i  «=1 

On  the  computational  sid»-  ot  things  we  used  20  starting  points  in  the  sampling.  Each 
sample  involved  80  steps,  {u/1, . . .  ,u;80} ,  the  last  50  being  kept  to  form  the  expected  value. 
Each  step  was  composed  of  randomly  dividing  the  24  components  of  the  string  into  six 
groups  (four  in  each).  We  then  chose  a  group  and,  holding  the  other  groups  fixed,  picked 
a  value  for  it  according  to  the  distribution  p\  (see  section  4).  We  repeated  this  procedure 
until  each  of  the  six  groups  had  been  allowed  to  vary  once. 

We  note  that  the  computational  time  taken  to  conduct  all  the  steps  of  the  gradient 
descent  (involving  the  estimation  ofVZ/Z  several  hundred  times)  was  less  that  that  needed 
to  do  one  exact  computation  of  VZ/Z. 

Recalling  that  A  was  our  estimate,  we  have  (see  section  3  and  4) 

Zj[(w,w*)  =  E(uj, w*)  -  i)  (25-(i/2))+* -»+i  Z[\)!Z(X) 

£*(<*;,•)  =  £(w,)-  V276+,Z(A)/Z(A) 

Ej  being  the  expected  value  under  the  distribution  generated  by  A.  The  percent  error  in 
Ex  is 

the  true  value  of  V,Z/Z 

the  value  of  the  associated  statistic  in  the  original  system 

One  measure  of  the  “fit”  of  the  maximum  entropy  distribution  generated  by  A  is  the  median 
value  of  the  percent  error,  which  was,  for  the  A  we  found,  .07.  So,  compared  to  the  original 
statistics,  the  errors  in  the  statistics  for  the  maximum  entropy  distribution  generated  by  A 
were  typically  small. 

The  results  on  the  following  pages  contain  more  detailed  information  about  the  be¬ 
havior  of  the  maximum  entropy  distribution  generated  by  A.  They  are  the  statistics  of  the 
original  distribution,  the  estimated  value  ofVZ(A)/Z(A)  and  the  true  value  ofVZ(A)/Z(A) 
The  results  are  presented  in  ten  row,  thirty  column  tables.  (On  the  first  row  we  will  have 
{ViZ/Z,  . . . ,  VioZ/'Z)  ,  on  the  second  (Vu  ZjZ,  . . . ,  V2oZ/Z}  ,  etc.)  They  are  presented  in 
such  a  way  that  the  statistics  in  the  first  table  have  the  same  position  in  their  table  as  the 
gradient  associated  with  that  statistic  has  in  its  own  table.  The  statistics  concerning  the 
E(u>, Wj)  are  thus  on  the  top  of  the  table. 
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0.0806 

0.4255 

-0.5389 

0.4194 

-0.0658 

-0.2675 

0.1587 

-0.4326 

•0.1210 

-0.0087 

-0.1561 

0.1139 

-0.2435 

0.1101 

-0.2267 

0.2384 

•0.1810 

0.1296 

0.1696 

-0.0861 

0.1153 

-0.2382 

0.0820 

0.0935 

0.0209 

-0.2566 

-0.0483 

-0.3900 

0.0760 

-0.0324 

-0.0611 

0.1214 

-0.3010 

0.1056 

0.2089 

-0.0626 

0.0139 

0.3400 

0.2605 

-0.3386 

0.1190 

0.1702 

0.4366 

0.1741 

-0.0355 

-0.0506 

•0.1471 

-0.3767 

0.1875 

-0.4381 

0.0578 

-0.1158 

-0.2955 

0.1512 

•0.1849 

0.0747 

0.2564 

-0.0816 

-0.1674 

-0.0171 

0.1283 

-0.3382 

0.2184 

0.1454 

0.0056 

-0.1996 

0.1433 

0.0238 

-0.0056 

0.0272 

0.1411 

0.0856 

0.1670 

-0.1307 

-0.0344 

-0.1274 

•0.1325 

0.1253 

0.0540 

-0.3286 

-0.1359 

-0.1806 

0.1306 

0.2210 

-0.0150 

-0.0332 

0.0306 

0.1210 

0.1267 

-0.0879 

0.1225 

-0.1729 

-0.0798 

0  0621 

0.1784 

0.4491 

0.1652 

0.0265 

-0.1744 

0.0841 

0.0290 

0.1046 

0.0138 

•0.0861 

0.2085 

-0.3274 

-0.3316 

0.1087 

-0.0558 

0.0223 

0.0815 

0.3093 

-0.1086 

0.3438 

0.0098 

0.1954 

0.1346 

0.1088 

•0.0214 

-0.0820 

-0.0993 

-0.1701 

-0.0309 

-0.2G73 

-0.2077 

-0.1502 

0.3540 

-0.0165 

-0.2966 

0.3280 

0.0808 

-0.1662 

0.1384 

0.2146 

0.1215 

0.0553 

0.0776 

0.1831 

-0.1699 

-0.1746 

-0.4085 

0.1094 

0.0478 

0.3981 

0.1448 

0.2816 

0.0204 

0.7344 

0.3949 

-0.0182 

0.1038 

-0.2769 

•0.1695 

-0.0471 

-0.4299 

•0.6435 

-0.2733 

•0.1378 

0.0675 

-0.2034 

0.0143 

-0.1123 

•0.1405 

-0.0727 

0.0058 

-0.2118 

-0.1079 

-0.2927 

-0.1603 

-0.0798 

-0.3710 

0.1078 

-0.0310 

0.0578 

-0.2163 

•0.1739 

-0.1297 

0.1580 

-0.5501 

-0.3170 

-0.2999 

•0.4132 

-0.0192 

-0.2100 

0.2303 

0.5548 

0.3427 

-0.1447 

0.4944 

0.2766 

0.1152 

-0.1655 

0.0974 

-0.0307 

0.1474 

-0.2993 

-0.1297 

•0.2945 

-0.0280 

-0  1637 

0.3361 

0.2909 

-0.0898 

0.1631 

0.0670 

0.2316 

0.2813 

0.0894 

-0.2684 

-0.1118 

0.0848 

0.1245 

-0.1346 

0.2802 

-0.1257 

-0.3076 

-0.1960 

-0.1132 

-0.2609 

0.1727 

0.0115 

•0.0492 

-0.1137 

0.3927 

0.4310 

0.0894 

-0.1906 

0.2345 

-0.1451 

0.1649 

0.0971 

•0.2461 

•0.0658 

-0.1536 

-0.0483 

-0.0762 

•0.2189 

-0.0902 

0.1518 

-0.1577 

0.1587 

0.0587 

0.2528 

0.0247 

0.0028 

0.0776 

-0.2494 

0.1086 

-0.0153 

0.3228 

0.1336 

-0.0243 

0.2994 

0.1015 

-0.0400 

0.1356 

-0.3317 

-0.1941 

0.0738 

-0.1794 

0.1848 

0.1888 

0.3935 

0.1908 

0.0734 

0.0429 

0.2873 

0.1646 

-0.1003 

-0.0602 

0.1756 

-0.2132 

-0.1913 

0.3897 

0.1054 

-0.1534 

0.0292 

0.0823 

0  0419 

-0.2966 

-0.0259 

0.1721 

-0.1384 

0.3438 

-0.0841 

-0.0239 

-0.6172 

-0.3503 

-0.6228 

0.C238 

-0.0911 

0.4876 

0.1390 

-0.2291 

0.4696 

-0.0226 

0.3911 

00047 

-0.3163 

0.3157 
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0.0181 

0.0044 

0.0103 

-0.0576 

0.0205 

0.0526 

-0.0433 

0.0646 

0.0483 

-0  0115 

0.0459 

-0.1038 

-0.0199 

0.0230 

•0.0314 

0.0089 

-0.0006 

0.0140 

0.0496 

-0.0075 

-0.0019 

-0.0347 

0.0754 

-0.0593 

-0.0005 

-0.0027 

0.0156 

0.0104 

0.0496 

-0.0028 

-0.0189 

0.0001 

-0.0323 

0.0606 

-0.0181 

0.0076 

-0.0114 

0.0145 

-0.0614 

-0.0072 

0.0466 

0.0200 

0.0033 

-0.0422 

0.0862 

0.0134 

0.0287 

0.0575 

-0.0071 

0.0232 

0.0137 

0.0614 

0.0371 

-0.0227 

0.0176 

0.0183 

0.0545 

-0.0850 

-0.0454 

0.0182 

-0.0331 

0.0112 

0.0113 

0.0776 

-0.0295 

-0.0248 

0.0096 

0.0523 

0.0855 

-0.0403 

0.0834 

-0.0371 

0.0368 

-0.0412 

0.0673 

0.0240 

-0.0730 

-0.0221 

-0.0160 

0.0146 

-0.0263 

-0.0442 

-0.0489 

0.0198 

0.0603 

0.0524 

-0.0090 

-0.0063 

0.0133 

-0.0652 

0.0252 

-0.0466 

-0.0238 

0.0099 

-0.0020 

0.0516 

0.0211 

-0.0190 

•0.0321 

-0.0085 

-0.0406 

-00124 

-0.0615 

-0.0315 

0.0615 

00178 

-0.0483 

•0.0144 

0.0403 

0.1209 

0.0250 

0.0564 

0.0125 

-0.0317 

-0.0714 

-0.0513 

-0.0110 

-0.0483 

-0.1200 

0.0366 

0.0016 

0.0281 

-0.0513 

0.0288 

-0.0188 

-0.0363 

-0.0077 

-0.0792 

0.0160 

0.0388 

0.0323 

0.0200 

-0.0552 

0.0275 

0.0142 

-0.0188 

0.0411 

0.0186 

0.0623 

-0.0883 

0.0643 

0.0026 

0.0519 

0.0355 

0.0610 

-0.ri08 

0.0230 

0.0034 

-0.0136 

-0.0389 

0.0128 

0.0354 

-0.0573 

-0.0112 

-0.0207 

0.0238 

0.0049 

-0.0582 

0.0352 

0.0075 

-0.0270 

-0.0332 

0.0172 

0.0277 

0.0107 

0.0112 

0.0152 

-0.0301 

-0.0083 

-0.0053 

-0.0170 

-0.0432 

-0.0054 

-0.0363 

-0.0052 

0.0264 

00278 

0.0160 

-0.0186 

-0.0580 

-0.0392 

0.0497 

-0.0578 

-0.0270 

-0.0151 

-0.0147 

0.0072 

-0.0130 

-0.0220 

-0.0437 

-0.0520 

0.0346 

0.0025 

0.0689 

0.0136 

0.0174 

-0.021' 

0.0441 

0.0726 

-0.0476 

0.0547 

-0.0176 

0.0389 

-0.0030 

0.0127 

0.0142 

0.0230 

0.0347 

-0.0603 

-0.0647 

0.0342 

0.0191 

-0.0370 

0.0079 

0.0057 

•0.0448 

-0.0213 

-0.0560 

-0.0749 

0.0524 

-0.0717 

-0.0453 

0.0335 

-0.0058 

-0.0162 

0.0747 

0.0260 

0.0130 

-0.0213 

-0.0192 

-0.0337 

0.0176 

-0.0155 

-0.0457 

0.0396 

•0.0188 

-0.0057 

-0.0195 

0.0259 

0.0063 

■0.0233 

0.0006 

-0.0170 

-0.0228 

-0.0263 

-0.0110 

0.0063 

0.0075 

0.0340 

-0.0543 

0.0441 

-0.0648 

-0.0273 

0.0446 

-0.1162 

-0.0200 

-0.0304 

0.0106 

0.0494 

-0.0770 

0.0483 

0.0162 

0.0110 

-0.0118 

0.0310 

0.0231 

0.0208 

-0.0165 

0.0306 

0.0119 

-0.0227 

-0.0221 

0.0133 

-0.0107 

0  0304 

0.0365 

-0.0204 

-0.0463 

-0.0299 

0.0240 

-0.0724 

0.0058 

-0.0108 

-0.0077 

-0.0415 

-0.0424 

0.0195 

-0.0346 

-0.0599 

-0.0191 

-0.0524 

0.0213 

•0.0826 

0.0023 

-0.0123 

0.0168 

0.0367 

•0.1384 

0.0327 

0.0507 

Table  of  estimated  r/(X)/£(X) 
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-0.0233 

0.0004 

0.0116 

-0.0173 

0.0168 

0.0138 

•0.0258 

-0.0143 

-0.0304 

-0.0050 

0.0088 

0.0029 

0.0204 

0.0477 

0.0017 

0.0095 

•0.0259 

0.0077 

0.0056 

0.0386 

0.0042 

-0.0317 

•0.0117 

-0.0152 

-0.0069 

•0.0288 

-0.0116 

-0.0147 

0.0091  . 

0.0114 

0.0049 

-0.0140 

0.0130 

0.0087 

0.0282 

-0.0004 

-0.0071 

-0.0014 

-0.0171 

0  0035 

0.0081 

•0.0035 

0.0215 

•0.0188 

0.0214 

-0.0027 

-0.0297 

0.0244 

0.0249 

0.0135 

-0.0136 

-0.0196 

-0.0130 

0.0241 

-0.0123 

0.0262 

0.0044 

0.0198 

0.0049 

-0.0208 

0.0026 

-0.0019 

0.0015 

0.0301 

•0.0010 

-0.0112 

0.0188 

-0.0195 

-0.0145 

0.0258 

•0.0129 

0.0259 

0.0057 

-0.0064 

0.0000 

•0.0185 

-0.0125 

-0.0276 

-0.0139 

0.0189 

0.0029 

0.0281 

-0.0028 

-0.0001 

0.0279 

-0.0257 

-0.0197 

0.0059 

0.0068 

0.0244 

-0.0388 

0.0301 

0.0062 

0.0022 

0.0206 

0.0093 

0.0014 

0.0183 

0.0120 

-O.0009 

0  0000 

0.0401 

0.0010 

0.0006 

-0.0094 

0.0058 

-0.0060 

0.0031 

0.0023 

-0.0132 

0.0213 

0.0045 

•0.0235 

-0.0115 

0.0045 

0.0065 

-0.0008 

0.0152 

-0.0188 

-0.0323 

0.0065 

-0.0001 

-0.0046 

0.0057 

-0.0079 

0.0034 

0.0077 

-0.0069 

0.0025 

0.0225 

-0.0036 

0.0243 

-0.0186 

-0.0180 

0.0083 

-0.0144 

-0.0173 

-0.0136 

-0.0128 

0.0019 

0.0324 

-0.0094 

-0.0030 

-0.0065 

•0.0238 

-0.0071 

-0.0105 

0.0030 

0.0184 

0.0057 

-0.0095 

-0.0159 

0.0169 

0.0132 

-0.0046 

-0.0230 

-0.0011 

0.0057 

-0.0166 

0.0074 

•0.0256 

•0.0285 

-0.0130 

0.0238 

0.0146 

-0.0065 

0.0043 

-0.0159 

-0.0240 

0.0129 

-0.0079 

0.0203 

0.0316 

0.0142 

0.0012 

0.0306 

0.0129 

-0.0136 

-0.0041 

0.0124 

-0.0097 

-0.0131 

0.0050 

0.0218 

0.0098 

0.0016 

0.0128 

0.0009 

-0.0316 

-0.0288 

0.0036 

0.0049 

-0.0353 

-0.0014 

-0.0258 

-0.0010 

•0.0069 

-0.0120 

0.0307 

-0.0157 

-0.0292 

0.0070 

-0.0269 

-0.0089 

-0.0022 

0.0189 

-0.0253 

-0.0034 

0.0195 

-0.0147 

-0.0053 

-0.0039 

0.0180 

0.0210 

0.0176 

0.0094 

-0.0033 

0.0144 

0.0152 

-0.0099 

0.0133 

-0.0107 

-0.0375 

-0.0175 

-0.0173 

-0.0007 

0.0311 

0.0001 

0.0005 

0.0259 

0.0120 

0.0049 

-0.0465 

-0.0200 

-0.0194 

-0.0037 

-0.0021 

0.0017 

-0.0100 

-0.0199 

•0.0240 

-0.0005 

-0.0344 

0.0008 

-0.0130 

0.0004 

-0.0170 

-0.0019 

0.0167 

-0.0035 

-0.0038 

-0.0087 

-0.0272 

0.0035 

0.0050 

-0.0045 

0.0021 

-0.0245 

0.0099 

-0.0006 

•0.0143 

0.0111 

0.0005 

0.0045 

0.0213 

-0.0234 

0.0275 

-0.0046 

0.0039 

-0.0108 

0.0041 

0.0229 

0.0031 

0.0017 

-0.0148 

0.0223 

-0.0016 

-0.0240 

-0.0092 

-0.0002 

0.0066 

0.0200 

0.0274 

0.0393 

0.0226 

•0.0146 

0.0205 

0.0019 

-0.0198 

-0.0030 

•0.0054 

0.0204 

0.0006 

0.0308 

-0.0525 

0.0347 

0.0002 

-0.0478 

-0.0175 

0.0187 

Table  of  true  rZ(X)/Z(X) 
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Example  2:  Letter  recognition 

Let  us  consider  the  problem  of  invariant  letter  recognition.  We  will  be  presented  with 
a  picture  of  a  letter  of  unknown  size,  orientation  and  font,  and  we  wish  to  find  out  which 
letter  it  is.  For  the  sake  of  simplicity  we  will  use  simple  images  with  just  two  grey  levels 
(black  and  white),  and  we  will  just  consider  the  capital  letters  A, . . . ,  G. 

There  are  many  wajs  to  approach  this  problem,  the  one  we  will  consider  is  based  on 
feature  extraction.  We  will  deal  with  the  invariance  of  the  problem  by  extracting  features 
(scalars)  that  are  independent  of  the  orientation  or  size  of  the  letter.  Our  expert  system 
will  be  a  distribution  on  the  space  of  features  and  labels,  where  the  latter  identify  the 
letter.  This  distribution  will  be  used  to  find  the  probabilities  of  the  labels  conditioned  on 
the  observed  features  ( c.g .,  P(  ‘the  letter  is  an  A’  |  feature i  =  5,  feature2  =  6)  =  .3) 

Choosing  the  features  is  a  crucial  task,  and  should  be  given  as  much  consideration  as 
the  construction  of  the  expert  system  that  uses  them.  Features  can  be  roughly  separated 
into  two  groups,  local  and  global.  Global  features  deal  with  the  whole  picture  and  are  what 
we  used  in  the  results  presented  in  the  following  pages.  Local  features  deal  with  the  local 
behavior  of  the  picture  elements.  Hence,  local  features  are  ideal  for  occluded  pictures.  Local 
features  seem  more  powerful  and,  it  is  our  belief,  will  be  essential  for  a  true  solution  to  the 
invariant  character  recognition  problem. 

The  features  we  used  in  our  example  were  non-standard.  They  were  picked  because 
they  seemed  reasonable  and  not  too  difficult  to  compute.  They  mostly  deal  with  holes 
and  indentations.  A  hole  being  a  white  (non-letter)  region  completely  surrounded  by  the 
letter  (typically  A  has  a  hole,  C  does  not),  and  an  indentation  being  a  white  region  that 
is  connected,  is  in  the  convex  hull  (the  convex  hull  of  the  set  S  is  the  smallest  convex  set 
containing  S)  of  the  letter,  and  yet  not  a  hole.  Some  thought  will  show  that  this  is  exactly 
what  we  mean  by  an  indentation  (typically  O  has  no  indentations,  T  has  two).  Below  are 
listed  twelve  of  the  features  we  use. 

1  The  size  of  the  largest  hole  /  The  size  of  the  convex  hull  of  the  letter 

2  The  size  of  the  second  largest  hole  /  The  size  of  the  convex  hull  of  the  letter 

3  The  size  of  the  third  largest  hole  /  The  size  of  the  convex  hull  of  the  letter 

4  The  size  of  the  largest  indentation  /  The  size  of  the  convex  hull  of  the  letter 

5  The  size  of  the  second  largest  indentation  /  The  size  of  the  convex  hull  of  the  letter 

6  The  size  of  the  third  largest  indentation  /  The  size  of  the  convex  hull  of  the  letter 

7  The  ratio  of  longest  to  shortest  axis  of  the  largest  hole 

8  The  ratio  of  longest  to  shortest  axis  of  the  largest  indentation 

9  The  ratio  of  longest  to  shortest  axis  of  the  second  largest  indentation 

10  The  ratio  of  longest  to  shortest  axis  of  the  third  largest  indentation 

11  The  total  area  of  the  indentations  in  the  largest  hole  /  The  size  of  the  letter 

12  The  total  area  of  the  indentations  in  the  largest  indentation  /  The  size  of  the  letter 

We  also  have  several  other  features  that  deal  with  the  p6ints  that  span  the  convex 
hull.  We  construct  these  features  as  follows.  Let  our  original  set  of  points  be  the  smallest 
set  that  spans  the  convex  hull  of  the  letter.  At  every  step  remove  one  point  from  our  set  of 
points,  picked  to  maximize  the  area  spanned  by  the  remaining  points.  Continue  doing  this 
until  no  points  are  left.  Our  final  features  shall  be 

13  The  number  of  points  that  span  the  convex  hull  of  the  letter 

14  The  area  spanned  by  six  remaining  points  /  The  area  of  the  convex  hull 

15  The  area  spanned  by  five  remaining  points  /  The  area  of  the  convex  hull 
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16  The  area  spanned  by  four  remaining  points  /  The  area  of  the  convex  hull 

17  The  area  spanned  by  three  remaining  points  /  The  area  of  the  convex  hull 

These  features  are  useful  since  they  tell  us  how  curved  the  letter  is.  For  example  the 
letter  ‘E’  is  square  so  the  area  left  after  removing  ail  but  four  points  should  be  large,  but 
when  only  three  points  remain  the  number  should  be  much  smaller. 

The  knowledge  we  used  to  form  our  expert  system  deals  with  the  expected  values  of 
the  features,  the  features  squared,  and  the  products  of  selected  features,  all  conditioned  on 
the  letter  (eg.  £(featureii  the  letter  is  an  A)  =  C\,  E{ (feature i)2  the  letter  is  an  A)  —  Ci, 
E(featurej  •  feature2|  the  letter  is  an  A)  =  C3).  We  also  give  our  system  the  very  important 
piece  of  knowledge  that  the  letters  are  of  equal  probability  (each  of  probability  1/7)  It 
would  be  nice  to  use  all  the  products  of  features  as  constraints,  but  with  seven  letters  and 
17  features  we  would  have  several  thousand  constraints  and  this  is  computationally  difficult. 
In  the  experiment  that  yielded  the  results  on  the  following  pages  we  used  the  conditional 
probabilities  of  only  fifteen  different  products.  The  total  number  of  constraints  was  thus 
350. 


On  the  computational  side  we  use  several  techniques  to  improve  the  behavior  of  the 
gradient  descent.  The  first  technique  deals  with  the  constraints  themselves.  In  general 
a  constraint  is  of  the  form  £(G,)  =  Cj,  where  G,  is  some  function  and  C,  is  a  constant. 
Typically  we  find  C,  by  taking  a  test  sample,  and  using  the  sample  mean  of  G, .  However 
this  does  have  problems.  Since  our  gradient  descent  is  not  exact  we  typically  end  up  with 
ViZ/Z  small  for  all  1  and  typically  of  the  same  order  of  magnitude,  but  not  exactly  zero. 
Thus  if  we  have  as  a  constraint  .£(100  •  G;)  =  100-  Cj  for  some  j,  then  E(Gj)  =  C3  will 
come  much  closer  to  being  true  in  the  resulting  maximum  entropy  distribution  than  if  we 
had  E(Gj)  —  Cj  as  the  constraint.  Also,  there  is  a  problem  caused  by  wanting  our  expert 
system  to  recognize  things  not  in  the  sample  that  formed  our  constraints.  In  the  following 
results  we  trained  our  system  with  5  samples  of  each  letter.  Now,  what  will  happen  if  we 
try  to  recognize  a  letter  that  was  not  in  the  population  we  used  to  train  the  system?  We 
would  like  it  to  be  recognized,  especially  if  it  is  similar  to  the  original  population.  This 
does  not  always  happen.  One  particular  case  of  this  problem  is  caused  by  boundry  effects. 
If  a  feature  has  range  0  to  1,  and  all  the  sample  letter  C’s  had  value  0  for  this  feature, 
then  the  only  way  to  satisfy  the  constraint  £(feature  ;  letter  is  a  C)  =  sample  mean  =  0, 
is  to  have  P  (feature  =  0  letter  is  a  C)  =  1.  If  we  present  a  C  which  has  value  .001  for 
this  feature,  it  will  not  be  recognized.  While  this  problem  could  be  cured  by  having  a  large 
sample  (and  should  be),  it  and  the  previous  problem  can  both  be  dealt  with  by  scaling  and 
slightly  modifying  the  constraint  functions.  For  the  full  details  we  refer  the  reader  to  [2]. 

Now  let  us  consider  the  sampling  method  itself.  In  this  problem  the  constraints  have 
a  rather  odd  form,  almost  all  of  them  are  conditioned  on  the  letter.  This  can  lead  to 
difficulties  in  the  sampling  method.  When  the  letter  is  an  A,  for  example,  the  features 
tend  to  have  certain  values,  as  the  constraints  specify.  At  every  step  in  the  sampling  we  go 
through  the  feature  vector,  holding  most  of  the  features  fixed  and  then  picking  those  that 
are  not  fixed  according  to  a  distribution.  However  when  the  label  is  ‘A’,  the  features  tend 
to  stay  within  a  certain  range.  When  it  comes  time  to  fix  the  features  and  vary  the  label, 
the  distribution  that  we  use  to  pick  a  label,  being  generated  by  features  that  correspond  to 
an  A,  will  emphasize  the  label  ‘A’.  This  is  to  be  expected,  since  we  cam  think  of  the  label 
‘A’  as  corresponding  to  some  region  in  the  state  space,  and  forming  a  sort  of  ‘well’  in  the 
energy  landscape  (a  region  of  very  likely  events,  corresponding  to  ‘A’s,  surrounded  by  a 
region  of  low  probability  that  corresponds  to  feature  values  not  associated  with  any  letter). 
Once  such  a  ‘well’  is  entered  it  can  be  difficult  to  get  out  of.  So,  if  the  label  ‘A’  is  turned  on 
it  tends  to  stay  on,  and  our  sample  will  quite  possibly  over-emphasize  one  particular  letter 
at  the  expense  of  the  rest. 

Since  all  the  constraints  involving  the  features  are  conditional,  we  can  use  the  following 


238 


change  in  the  sampling  method  to  cure  the  problem  mentioned  above.  We  will  first  fix  label 
at  ‘A’.  We  will  then  conduct  gradient  descent  on  Z  until  the  values  of  V,Z/Z  are  small  for 
all  the  t’s  corresponding  to  constraints  conditioned  on  the  label  being  ‘A’.  Then  we  fix  the 
label  ‘B’  and  continue.  After  we  have  gone  through  all  the  letters  (in  our  case  A,. .  .,G)  we 
start  sampling  normally  (letting  the  label  vary).  Since  the  values  ofV,Z/Z  are  small  for 
all  t’s  corresponding  to  conditional  constraints,  we  need  only  conduct  gradient  descent  until 
the  constraint  that  all  letters  be  equally  likely  is  (close  to  being)  satisfied. 

Now  let  us  present  the  results.  The  letters  we  wished  to  identify  are  on  the  following 
pages.  They  are  the  same  letters  that  were  used  to  find  the  sample  means  in  the  constraints. 
The  probabilities  of  the  labels  conditioned  on  the  observed  features,  given  by  the  maximum 
entropy  expert  system,  is  provided  underneath  the  letters.  Only  the  top  three  probabilities 
are  listed  for  each  letter,  in  the  interest  of  saving  space.  The  energies  are  also  listed,  where 
the  energy  is  i  ~  ai(w)A,  (where u  is  the  element  of  Q  corresponding  to  the  fee'  are  vector 
plus  the  hypothesized  label,  and  A  is  the  result  of  our  minimization  of  Z).  The  energies  are 
given  to  provide  some  comparison  between  different  letters  (“this  E  looks  more  like  tin  E 
than  that  E”). 
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Pictur** 

Hypothesis 

Probability  of  Hypothesis 

Energy  of  Hypothesis 

A1 

it  is  an  A 

0.6719273 

3.8487754 

A1 

it  is  an  E 

0.0002583 

11.7124262 

A  1 

it  is  an  F 

0.3277985 

4.5665264 

A  2 

it  is  an  A 

0.9809950 

2.1446524 

A2 

it  is  an  E 

0.0012927 

8.7764645 

A2 

it  is  an  F 

0.0177039 

6.1594334 

A3 

it  is  an  A 

1.0000000 

0.2035229 

A3 

it  is  an  B 

0.0000000 

18.8109589 

A3 

it  is  an  E 

0.0000000 

23.3654041 

A  4 

it  is  an  A 

0.9920666 

1.6209198 

A4 

i*  is  an  E 

0.0000050 

13.8176394 

A4 

it  is  an  F 

0.0079282 

6.4502902 

A5 

it  is  an  A 

0.9974482 

5.4865880 

A5 

it  is  an  E 

0.0001584 

14.2343798 

A5 

it  is  an  F 

0.0023898 

11.5205803 

24C 


Picture 

Hypothesis 

Bl 

it  is  a.’i  B 

Bl 

it  is  an  C 

Bl 

it  is  an  D 

B2 

it.  is  an  B 

B2 

it  is  an  C 

B2 

it  is  an  E 

B3 

it  is  an  A 

B3 

it  is  an  B 

B3 

it  is  an  E 

B4 

it  is  an  B 

B4 

it  is  an  E 

B4 

it  is  an  F 

B5 

it  is  an  B 

B5 

it  is  an  E 

B5 

it  is  an  F 

Probability  ol'  Hypothesis 

1 .0000000 
0.0000000 
0.0000000 

1.0000000 

0.0000000 

0.0000000 

0.0000000 

1.0000000 

0.0000000 

0.9999986 

0.0000006 

0.0000003 

0.9999995 

0.0000000 

0.0000004 


Energy  of  Hypothesis 

5.7593293 

26.1249256 

26.0998402 

6.2038503 

26.3224182 

26.6758900 

29.8165340 

9.1290588 

30.7371597 

5.9010286 
20.2056236 
20  8549023 

5.5978689 

23.7445774 

20.2729683 
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2 


3 


4  5 


Picture 

Hypothesis 

Probability  of  Hypothesis 

Energy  of  Hypothesis 

Cl 

it  is  an 

C 

0.6604182 

-1.9814487 

Cl 

it  is  an 

E 

0.0184070 

1.5986940 

Cl 

it  is  an 

G 

0.3191914 

-1.2543660 

C2 

it  is  an 

C 

0.6313014 

-1.3120894 

C2 

it  is  an 

E 

0.1306149 

0.2634406 

C2 

it  is  an 

G 

0.2368994 

-0.3319414 

C3 

it  is  an 

C 

0.5955555 

-1.6064481 

C3 

it  is  an 

E 

0.0437944 

1.0035404 

C3 

it  is  an 

G 

0.3588811 

-1.0999445 

C4 

it  is  an 

C 

0.6606517 

-0.5129181 

C4 

it  is  an 

E 

0.0676684 

1.7656897 

C4 

it  is  an 

G 

0.2695387 

0.3835969 

C5 

it  is  an 

C 

0.6386604 

-1.4192295 

C5 

it  is  an 

E 

0.0491251 

1.1457740 

C5 

it  is  an 

G 

0.3097548 

-0.6956378 
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Picture 

Hypothesis 

Dl 

it  is  an  A 

Dl 

it  is  an  B 

Dl 

it  is  an  D 

D2 

it  is  an  B 

D2 

it  is  an  D 

D2 

it  is  an  G 

D3 

it  is  an  B 

D3 

it  is  an  D 

D3 

it  is  an  F 

D4 

it  is  an  B 

D4 

it  is  an  D 

D4 

it  is  an  G 

D5 

it  is  an  A 

D5 

it  is  an  B 

D5 

it  is  an  D 

Probability  of  Hypothesis 

0.0000051 

0.0011396 

0.9988480 

0.0003877 

0.9994236 

0.0000976 

0.0000000 

1.0000000 

0.0000000 

0.0005200 

0.9994300 

0.0000154 

0.0000039 

0.0003292 

0.9996585 


Energy  of  Hypothesis 

19.0304546 

13.6119480 

6.8360224 

12.2251072 

4.3704495 

13.6045826 

89.0601807 

-12.7670460 

82.0110168 

12.7396383 

5.1784315 

16.2575455 

17.1760368 

12.7477312 

4.7292614 
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4  5 


Picture 

Hypothesis 

Probability  of  Hypothesis 

Energy  of  Hypothesis 

El 

it  is  an  A 

0.0000728 

7.3868618 

El 

it  is  an  E 

0.9531993 

-2.0928898 

El 

it  is  an  F 

0.0467241 

0.9226741 

E2 

it  is  an  C 

0.0000507 

7.1914301 

E2 

it.  is  an  E 

0.9832692 

-2.6817935 

E2 

it  is  an  F 

0.0166214 

1.3983997 

E3 

it  is  an  A 

0.0003051 

8.8423738 

E3 

it  is  an  C 

0.0000295 

11.1771812 

E3 

it  is  an  E 

0.9996634 

0.7478167 

E4 

it  is  an  A 

0.0003749 

6.4038081 

E4 

it  is  an  E 

0.9340211 

-1.4169102 

E4 

it  is  an  F 

0,0655886 

1.2391865 

E5 

it  is  an  A 

0.0000976 

6.7060304 

E5 

it  is  an  E 

0.9618846 

-2.4899280 

E5 

it  is  an  F 

0.0380000 

0.7413796 
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Picture 

Hypothesis 

FI 

it  is  an  A 

FI 

it  is  an  E 

FI 

it  is  an  F 

F2 

it  is  an  A 

F2 

it  is  an  E 

F2 

it  is  an  F 

F3 

it  is  an  A 

F3 

it  is  an  E 

F3 

it  is  an  F 

F4 

it  is  an  E 

F4 

it  is  an  F 

F4 

it  is  an  G 

F5 

it  is  an  A 

F5 

it  is  an  E 

F5 

it  is  an  F 

Probability  of  Hypothesis 

0.0009967 

0.0000012 

0.9990017 

0.0136954 

0.0032687 

0.9830301 

0.0021422 

0.0016594 

0.9961985 

0.8462385 

0.1515211 

0.0020051 

0.0003445 

0.8101026 

0.1895521 


Energy  of  Hypothesis 

5.9281058 

12.6835003 

-0.9819657 

3.9498260 

5.3824973 

-0.3237556 

5.9868760 

6.2422500 

-0.1552548 

-0.0166720 

1.7034043 

6.0284510 

8.3701258 

0.6073851 

2.0598819 
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Picture 

Hypothesis 

Probability  of  Hypothesis 

Energy-  of  Hypothesis 

Gl 

it  is  an  C 

0.1939279 

1.1415343 

Gl 

it  is  an  E 

0.2315701 

0.9641383 

Gl 

it  is  an  G 

0.5542700 

0.0913689 

G2 

it  is  an  C 

0.2378499 

0.9435763 

G2 

it  is  an  E 

0.1392609 

1.4788672 

G2 

it  is  an  G 

0.6008883 

0.0168073 

G3 

it  is  an  C 

0.3128236 

0.1166506 

G3 

it  is  an  E 

0.0234408 

2.7078128 

G3 

it  is  an  G 

0.6587726 

-0.6280884 

G4 

it  is  an  C 

0.2702022 

0.0306356 

G4 

it  is  an  E 

0.0150385 

2.9191945 

G4 

it  is  an  G 

0.7069070 

-0.9310930 

G5 

i»  is  an  C 

0.3253015 

-0.3527871 

G5 

it  is  an  E 

0.1510101 

0.4146184 

Go 

it  is  an  G 

0.5172358 

-0.8165337 
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ABSTRACT.  Methodologies  for  treating  random  field  problems  by  finite 
elements  are  described.  The  methods  are  based  on  second-moment  analysis 
procedures,  but  they  are  remarkably  robust  and  are  able  to  deal  with 
substantial  nonlinearities.  Both  static  and  dynamic  problems  have  been 
considered.  Some  applications  to  linear  and  elastic-plastic  structures  are 
described  along  with  potential  applications  to  fracture  which  are  now  being 
considered. 

I.  INTRODUCTION.  The  probabilistic  analysis  of  engineering  problems  by 
finite  element  methods  is  currently  a  dynamic  area  of  research.  The  most 
widespread  statistical  approach  for  analyzing  probabilistic  systems  is  by 
simulation,  the  direct  Monte  Carlo  Simulation  [1-3]  being  the  most  frequently 
ustsd.  Since  the  accuracy  of  the  statistical  results  is  dependent  on  the 
number  of  samples,  the  analysis  can  be  prohibitively  expensive  for  large 
systems.  Although  simulation  techniques  can  be  applied  to  linear  and 
nonlinear  systems,  they  are  in  general  quite  inefficient.  Thus,  there  is 
considerable  interest  in  non-statistical  approaches,  such  as  second-moment 
analysis  and  Probabilistic  Finite  Element  Methods  (PFEM) .  For  linear  systems, 
second-moment  analysis  techniques  [3,4]  have  proven  to  be  effective  in 
structural  mechanics.  But,  the  extension  of  second-moment  analysis  to 
nonlinear  structural  dynamics  is  not  currently  feasible.  Consequently,  recent 
developments  in  the  statistical  analysis  of  linear  and  nonlinear  structural 
dynamics  have  been  advanced  with  PFEM. 

Although  the  development  of  PFEM  is  a  relatively  new  area  of  research, 
the  amount  of  literature  is  quite  broad  [5-13].  The  authors'  research  has 
encumpassed  both  static  and  dynamic  linear  PFEM  as  well  as  recent  advances  in 
nonlinear  PFEM.  The  development  of  PFEM  for  static  linear  analysis  with 
material  randomness  is  discussed  in  Ref.  [9].  In  the  application  of  PFEM  for 
linear  dynamics,  secular  terms  arise  in  the  statistical  distributions  causing 
erroneous  results  [8,11,13].  In  Refs.  [8,9],  the  PFEM  is  extended  to  static 
and  dynamic  nonlinear  analysis  with  material  and  geometric  nonlinearities. 
Extensive  research  has  been  done  in  the  application  of  PFEM  for  elastic/ 
plastic  materials  [9,10].  Recently,  the  PFEM  has  been  developed  using  a 
potential  energy  variational  principle  [11].  In  this  manner,  problems  with 
random  materials,  shapes,  body  forces,  and  boundary  conditions  can  be  easily 
incorporated  into  the  PFEM.  In  Ref.  [12],  the  probabilistic  potential  energy 


*The  support  of  NASA  Lewis  Grant  No.  NAG3-535  for  this  research  and  the 
encouragement  of  Dr.  Christos  Chamis  are  gratefully  acknowledged. 


249 


variational  principle  is  extended  to  a  three-field  Hu-Washizu  variational 
principle.  The  PFEM  has  proved  to  be  a  very  efficient  means  of  non- 
statistical  analysis  for  linear  and  nonlinear  continuum  in  statics  and 
dynamics.  Currently,  the  majority  of  the  research  in  PFEM  is  directed  toward 
improved  nonlinear  analysis. 

It  has  been  observed  [8]  that  the  secular  terms  arise  in  nonlinear 
transient  analysis  as  well.  Elimination  of  these  secular  terms  is  not  as 
straightforward  as  in  linear  transient  analysis,  and  is  a  current  topic  of 
research.  The  nonlinear  probabilistic  analysis  herein,  is,  therefore, 
restricted  to  statics. 

In  the  next  section,  the  linear  transient  PFEM  equations  and  the  scheme 
for  eliminating  secularitieB  are  outlined.  In  Section  III,  the  PFEM  equations 
for  nonlinear  statics  are  derived.  In  Section  IV,  the  effectiveness  of  PFEM 
and  the  scheme  for  eliminating  secularitles  are  demonstrated.  In  Section  V, 
the  conclusions  and  potential  applications  to  fracture  are  discussed. 

II.  PFEM  FOR  TRANSIENT  ANALYSIS.  As  a  consequence  of  applying  PFEM  for 
transient  analysis,  secular  terms  arise  in  the  higher  order  equations  and 
hence,  all  statistical  results  [8].  Many  theoretical  methods  have  been 
developed  for  eliminating  secularitles  and  the  literature  is  quite  extensive. 
Secular  terms  erroneously  result  from  the  perturbation  process  causing  the 
higher  order  equations  to  increase  indefinitely  with  time  or  until  damped 
away.  Thus,  secularitles  cause  all  statistical  results  such  as  the 
expectation  and  variance  of  displacement  to  be  unbounded  for  long  times.  The 
characteristics  of  secularitles  and  a  method  for  their  removal  have  been 
developed  for  a  single  degree-of-freedom  random  oscillator  [13],  but  to  the 
authors'  knowledge  no  methods  have  been  developed  for  PFEM.  Consequently, 
there  is  a  considerable  need  to  develop  means  for  eliminating  secular  terms  in 
PFEM. 


Initially,  consider  a  structural  dynamic  system  governed  by  the  following 
linear  system  of  equations  which  are  developed  from  a  finite  element 
dlscretiztion: 

Md+cS+Kd-F  (2.1) 

/W  A#  MM  W 

where  M,  C,  and  K  are  the  mass,  damping  and  stiffness  matrices,  respectively; 

F  is  the  external  force  vector;  d  is  the  displacement  vector;  and  a 

*s/ 

superscript  dot  represents  time  differentiation.  The  mass  is  assumed  to  be 
deterministic  whereas  the  stiffness  and  damping  are  assumed  to  be  functions  of 

a  generalized  variance  vector  Var(b)  where  b(x)  is  a  random  field.  The  basic 
idea  in  applying  second-moment  analysis  to  develop  PFEM  involves  expanding  all 
random  functions  about  the  mean  value  of  the  random  field  b(x) ,  denoted 

by  b(x).  That  is,  for  a  small  parameter  e,  the  random  function  d(b,t)  is 

expanded  about  b(x)  via  a  second-order  perturbation  at  a  given  x  and  the 

random  field  is  discretized  along  with  the  random  functions  as  follows: 
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(2.2) 


q  1  2  q 

d  *  d~  +  e  E  d,  Ab.  +  y  e  E  d.  .  Ab. Ab. 

i-1  i  i,j-l  i  j  J 

where  d_,  d.  ,  and  d,  .  represent  the  mean,  the  first  order  variation 
~0  ~bi  ~bibJ 

about  b,  and  the  second  order  variation  about  b  of  the  displacement:  Ab. 

/v  i 

represents  the  first  order  variation  of  b^  about  b^;  and  q  is  the  number  of 

random  variables.  Complete  details  of  this  procedure  can  be  found  in  Refs. 
[7,8].  Similar  expansions  are  done  for  F,  K,  and  C.  Substitution  of 


these  expansions  for  d,  F,  K,  and  C  into  Eq.  (2.1)  yields  the  following  three 


«N» 


equations  for  d„,  d,  ,  and  d„ : 

~U  ~b^  ~2 


Zeroth  Order  Equation 

~  ~0  +  ~0^0  +  ~0~0  "  ~0 
First  Order  Equation 


(2.3) 


~  +  ~c4bi  +  ~0~b^  "  ~lbi  * 


where 


i  “  1 . . 


(2. A) 


f,k  -  F,  -  (C.  dn  +  K.  d.) 


1 ,  .  •  • ,  q 


(2.5) 


Second  Order  Equation 


S  Si  +  Soil  +  S<&  ■  h 


(z.6) 


where 


q  1 

E  It  h 


1*1 

.  2  ~b  b.  2  ~b.  b.~0  2  ~b.  b.~0  ~b,~b. 

i=*l  ii  ii  ii  i  i 


(2.7) 


-  w,v“(bi> 

and 

Si-  j  ’  Sb  b  Var(bi)  ■  <2-8) 

z  *  i-1  DiDi  1 

The  solution  of  Eqs.  (2.3)  and  (2.6)  yields  d.  and  d„,  respectively,  whereas 

~U  ~2 

the  solution  of  Eqs.  (2.4)  requires  q  solutions  to  obtain  d,  .  In  Eqs.  (2.4) 
through  (2.8)  it  has  been  assumed  that  b^  and  bj  are  ur  ^.d  for  i  j4  j , 
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thereby  enabling  the  full  covariance  matrix  Cov(b^,bj)  to  be  expressed  as  a 

diagonal  variance  matrix  Var(b^)  for  1  ■  j  and  zero  for  i  ^  j.  It  is  noted 

that  in  order  to  reduce  the  computations  further,  a  transformed  random 
variable  can  be  introduced  [9].  After  the  zeroth  order  equation  is  solved 
for  dn,  the  q  first  order  forcing  functions  given  by  Eq.  (2.5)  can  be 

evaluated.  Since  the  first  order  forcing  function  is  a  function  of  the  zeroth 
order  solution,  part  of  its  effect  will  be  resonant  causing  secularities  in 
the  first  order  solution  [11,13].  The  second  order  forcing  function  is  a 
function  of  the  first  order  solution  in  addition  to  the  second  order  solution, 
thus  secularities  also  result  in  the  second  order  solution.  When  damping  is 
present  in  the  system,  the  effect  of  secularities  is  present  until  it  is 
damped  away  for  long  durations.  The  secular  terms  in  the  first  and  second 
order  solutions  erroneously  result  from  the  perturbation  process. 

Therefore,  the  method  presented  in  this  paper  for  removal  of  secularities  in 
PFEM  involves  removing  the  resonant  part  from  the  first  and  second  order 
forcing  functions. 

The  mean  and  variance  of  displacement  are  defined  by 


E[d] 


/  d  f(b)db 

J  -v 


(2.9) 


and 


Var(d) 

♦v 


/  (d  -  dn)2f(b)db 


(2.10) 


respectively,  where  f(b)  is  the  probability  density  function.  Once  Eqs. 

(2.3),  (2.4),  and  (2.6)  are  solved  for  d_,  d.  ,  and  d„,  respectively, 

substitution  of  the  expansion  for  d  given  by  Eq.  (2.2)  into  Eqs.  (2.9)  and 

(2.10)  yields  the  second  order  accurate  expectation  and  first  order  accurate 
variance  of  displacement  given  by 

E [ d ]  -  d  +  d  (2.11) 

and 

q  2 

Var(d)  -  Z  (d.  V  Var(b.)  (2.12) 

i-1  i 


252 


respectively.  Since  d2  in  Eq.  (2.11)  has  secular  terms  present,  the 

expectation  of  displacement  will  increase  indefinitely  with  time.  Similarly, 
the  variance  of  displacement  will  also  increase  indefinitely  with  time  due  to 
secularities  in  d.  .  Similar  expressions  to  Eqs.  (2.11)  and  (2.12)  can  be 
~bi 

developed  for  strain  and  stress.  The  statistical  results  for  strain  and 
stress  will  also  be  invalid  for  long  times.  Thus,  there  is  a  considerable 
need  to  develop  methods  to  eliminate  secularities  in  PFEM  so  all  statistical 
results  are  bounded. 

There  is  a  vast  amount  of  literature  available  dealing  with  the 

analytical  removal  of  secularities  but  no  methods  have  been  developed  for  the 

numerical  elimination  of  secularities  in  PFEM.  The  method  presented  herein 

for  numerical  elimination  of  secular  terras  involves  using  Fourier  Analysis  to 

separate  the  resonant  and  non-resonant  parts  from  the  first  and  second  order 

forcing  functions.  By  performing  Fourier  Analysis  on  the  time  series 

for  I«’  (d„)  and  F-(dr>,dL  )  with  a  Fast  Fourier  Transform  (FFT),  the  time 

~lb .  '■'0  ~2  ~U  ~b. 

X  1 

aeries  can  be  separated  as  follows 


(2.13) 


and 


_  .  R  NR 

F  (dn,  d  )  =  F  +  F, 


(2.14) 


where  the  superscripts  R  and  NR  represent  the  resonant  and  non -resonant  partn, 

respectively.  Once  the  forcing  functions  are  separated,  only  the  non-resonant 

parts  of  F._  and  F„  are  evaluated  in  the  first  and  second  order  equations 
~lb^ 

given  by  Eqs.  (2.4)  and  (2.6)  yielding  solutions  which  are  devoid  of 
secularities.  In  order  to  remove  the  resonant  part  of  the  forcing  functions, 
che  frequency  spectra  of  the  system  must  be  known.  To  aid  in  this  part  of  the 
analysis,  a  highly  eficient  eigenvalue  routine  using  Lanczos  coordinates  is 
incorporated  to  obtain  a  reduced  system  tridiagonal  eigenproblem  [14],  The 
resonant  part  is  then  removed  by  weighting  all  coefficients  in  the  Fourier 
series  which  fall  within  a  designated  range  of  the  system  natural  frequencies 
[13].  That  is,  coefficients  which  are  very  close  to  the  natural  frequencies 
are  almost  entirely  eliminated  whereas  coefficients  which  are  separated  from 
the  natural  frequencies  are  unaffected.  Applicable  frequency  weighting 
windows  include  cosine  and  (cosine)^.  This  procedure  provides  an  effective 
and  efficient  procedure  for  eliminating  secularities  from  PFEM  so  all 
statistical  results  are  bounded.  Another  advantage  to  using  a  Lanczos 
coordinate  reduced  basis  is  the  solution  of  a  reduced  system  of  equations 
[15]. 

III.  PFEM  FOR  NONLINEAR  STATICS.  The  PFEM  equations  for  nonlinear 
statics  of  a  continuum,  incorporating  material  nonlinearities,  can  be  derived 
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using  the  approach  followed  in  Section  II.  The  discretized  equilibrium 
equations  governing  the  nonlinear  statics  are: 

f(d,  b)  -  F(b)  (3.1) 

/V  (V  M  AV  A/ 

where  f,  F  and  d  are  the  internal  force,  external  force  and  displacement 

vectors  respectively  and  b  is  the  discretized  random  vector  of  size  q,  [9]. 

The  randomness  could  arise  from  loading  and/or  material  properties.  The 
zeroth,  first  and  second-order  equations  corresponding  to  Eq.  (3.1)  are: 


Zeroth  Order  Equation 

7-7 

r**  tv 

First-Order  Equation 


q 


(3.2) 


(3.3a) 


and 

~i+2  ~  £b^  ~  £b.  *  "  1  *  q  (3.3b) 

where  K  is  the  tangent  stiffness  matrix. 

Second-Order  Equation 

K  d  -  F  (3.4»> 

~  ~  2. 

where 


1 

2 


q 

z 


Cov^.bj) 


and 


(3.4b) 


F9  - 


s  1  _ 

l.J-1 


bibJ 


-  7  Vj 


-  Sbjb  )Co,(brbj> 


(3.4c) 


The  computational  effort  in  solving  Eqs.  (3.3)  through  (3.4)  can  be 
reduced  significantly  by  transforming  the  full  covariance  matrix,  Cov(bitbj), 

to  a  diagonal  variance  matrix,  Varvcj)  [9],  Usually,  only  n  (n<q)  highest 

values  of  Var(c.)  are  necessary  [9,10].  Using  the  random  vector  c,  the  first 
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and  second  order  equations  are  simplified  to: 

First  Order  Equations 

K  d  “  F  i  ■  1*  •••»  Q 

~  ~c^  ~i+2 


where 


F  -  F  -  7 

~i+2  ~ci 


i  -  1 ,  •  •  • »  q 


and 


f  «  /  BT  a 
~Ci  0  ~  ~Ci 


d  n  . 


d=d 


(3.5a) 


(3.5b) 


(3.5c) 


Second-Order  Equations 


K  d  -  F 

»v  ***  Z. 


where 


(3 .ba) 


f„  -  r  l|  f„  „  -  4r  T  -  k  d  }var(c  ) 

~2  i=112  ~cicj  2  ~cicj  ~ci  i  1 


t  ^ 


(3.6b) 


K  «  /  BTC  B  dO 
~c.  i  ~  ~Tc  ~ 


'i  0 


and 


(3,6c) 


f  -  /  BT  o 
~CiCj  0  ~  "  --icj 


d£l 


d-d 


(3.6d) 


Once  d,  ^  and  T,  are  obtained,  the  mean  and  autocovariance  matrices  of  the 
ci 

displacement  can  be  computed  from: 

E[d]  -  d  +  d „  (3.7a) 

and 


v  <v> 


CovU1^)  -  {  E  d  1  dcJ  Var(cr)}  . 
r-1  r  cr 


(3.7b) 
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Next,  the  mean  and  autocovariance  matrices  of  the  stress  can  be  similarly 
computed.  At  any  point  (usually  an  integration  point)  in  the  domain  ft,  a  is 
computed  from  Eq.  (3.2)  and: 

[o]  -o  I  +'5tB'3  (3.8a) 

a*  C  .  .  I  t  *7  ~T  ~  . 

i  i  d-d  i 


[a] 


cici 


a 

cici 


+  CT 

d-d 


B  I  +  7?  B  1 

r>J  NT  fvC  r 

Ci  CiCj 


(3.8b) 


where  [  ]  denotes  total  derivative  and  represents  the  tangent  constitutive 

matrix.  Thus,  ~ 

E [ct ]  -  a  +  0O  (3.9a) 

fs*  rst  a/  ^ 

where 


a 

«v 


2 


[a  ]  „ 

~  cici 


Var(ci) 


(3.9b) 


and 


Cov(a 


i 


I  I  1*1,.  iT’lc  V«(c  )} 
r-1  r  r 


(3.9c) 


Evaluation  of  Internal  Force/Stress  Derivatives 


It  is  seen  that,  in  all  the  first  and  second-order  equations  derived  in 
Eqs.  (3.5a)  and  (3.6b),  the  derivatives  of  the  internal  force  and  stress  are 
required.  Direct  evaluation  of  these  derivatives  are  not  possible,  clearly, 
as  the  internal  force  and  stress  are  implicit  functions  of  the  random  vector 
c.  Usually  in  such  cases,  these  derivatives  are  replaced  by  their  finite- 

difference  counterparts  [16,17].  Employing  central-difference  approximations, 


fi t 


) 

d-d 


(3.10a) 


and 


cici 


a f 


AciAci 


2o  +  o  ) 

A# 

d-d 

M  A* 


(3.10b) 
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where 


a  +  -  a(c  +  Ac. ) 

*w  IV 

(3.10c) 

a  m  a( c  -  Ac. ) 

(3.10d) 

and  Ac.  are  defined  as 

~i 

T 

Ac^  ■  i*(|  0g  Ac^|  0 j  • t • )  0)  # 

(3. lOe) 

The  first  and  second-order  derivatives  of  the  internal  force  can  then  be 
obtained  from  Eqs.  (3.5c)  and  (3.6c),  respectively.  The  derivatives  of  the 
tangent  constitutive  matrix,  in  Eqs.  (3.6c)  and  (3.8b),  can  also  be 
approximated  similarly. 

IV.  NUMERICAL  EXAMPLES.  The  method  presented  in  Section  II  for  the 
elimination  of  secularltles  in  transient  PFEM  is  demonstrated  by  application 
to  a  multiple  degree-of-f reedom  transmission  tower.  The  effectiveness  of  the 
method  for  removing  secularltles  from  PFEM  (NOS),  is  compared  to  the  standard 
PFEM  solution  with  secularltles  (SEC),  and  a  Monte  Carlo  Simulation  (MCS)  with 
400  samples.  The  random  material  properties  are  incorporated  into  the  system 
by  choosing  Young's  Modulus  for  elements  1-4  and  6-9  as  uncorrelated  normal 
random  variables  with  a  coefficient  of  variation  of  5%.  Rayleigh  stiffness 
proportional  damping  is  added  to  the  system  enabling  the  model  to  incorporate 
random  stiffness  and  random  damping.  The  performance  of  the  method  is 
presented  in  Figs.  1  and  2  for  sinusoidal  excitation. 

The  problem  statement  is  presented  in  Fig.  1  for  a  15  node/32  bar 
transmission  tower  with  26  degrees-of-f reedom.  The  system  has  a  first  mode 
natural  frequency  of  8.7  cps  and  Rayleigh  stiffness  proportional  damping  with 
damping  ratio  equivalent  to  0.1%  of  first  mode.  The  expectation  and  variance 
of  the  x-displacement  of  node  2  are  shown  in  Figs.  2a  and  2b  for  a  (cosine)^ 
weighting  window,  respectively.  Since  the  second  order  solution  is  negligible 
compared  to  the  zeroth  order  solution,  secularltles  are  only  slightly  evident 
in  the  expectation.  In  Fig.  2b,  the  variance  of  displacement  exhibits 
secularltles  in  the  SEC  which  die  out  after  6  secs,  due  to  damping.  Initially 
all  three  methods  are  in  agreement  but  the  SEC  begins  to  deviate  from  the  MCS 
due  to  secularltles  until  they  are  damped  away.  The  method  presented  in  this 
paper  (NOS)  removes  the  secularltles  from  the  SEC  bringing  it  into  agreement 
with  the  MCS.  Initially,  the  NOS  removes  too  much  from  SEC  which  is  probably 
due  to  the  solution  being  heavily  dominated  by  the  transient  part.  The  method 
presented  is  valid  for  coefficients  of  variation  up  to  20%  as  in  the  PFEM. 

In  the  next  application,  the  PFEM  procedure  for  nonlinear  statics  is 
demonstrated.  The  problem  analyzed  is  an  elastic-plastic  plate  with  a 
circular  hole  and  subjected  to  uniform,  compressive  loading  (Fig.  3).  The 
load  is  assumed  to  be  random  with  a  coefficient  of  variation  of  10%  and  a 
correlation  length  (X)  of  3L  (Fig.  3).  The  response  statistics  viz.,  mean  and 
variance  with  respect  to  incremental  loading  and  the  spatial  correlation  of 
the  response  are  studied.  The  mean  and  variance  of  the  displacement,  at  Node 
400,  are  plotted  in  Figs.  4a  and  4b.  These  results  show  good  agreement  with 
those  obtained  by  Monte  Carlo  Simulation  (MCS)  [4,10]  of  400  realizations. 

The  maximum  coefficient  of  variation  of  the  displacement  is  found  to  be  ~10%. 
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The  mean  and  variance  of  Che  compressive  stress,  in  Element  15,  are 
plotted  in  Figs.  4c  and  4d.  The  mean  stress  is  in  good  agreement  with  the 
simulation  results,  whereas  the  variance  of  stress  shows  some  disagreement, 
particularly  at  larger  loads.  As  the  load  is  increased,  the  variance  of 
stress  increases  and,  after  a  certain  load,  starts  to  decrease.  Since  the 
material  is  assumed  to  be  elastic-plastic,  with  a  ratio  of  elastic  modulus  to 
plastic  modulus  as  100,  it  is  nearly  perfectly  plastic  for  large  strains. 
Therefore,  once  the  material  starts  yielding  at  a  point,  the  stress  is  nearly 
bounded  above  by  the  yield-stress.  This  causes  the  variance  of  stress  to  fall 
to  a  near-zero  level,  with  increasing  loading  (Fig.  4d).  The  maximum 
coefficient  of  variation  of  the  stress  is  also  found  to  be  ~10%. 

The  displacement  correlation  (w.r.t.  Node  400)  along  the  y-axis  and  the 
stress  correlation  (w.r.t.  Element  7)  along  the  x-axis  are  plotted  in  Figs.  5a 
and  5b.  The  displacement  shows  almost  complete  correlation  (i.e.,  1.0). 
However,  while  the  stress  shows  complete  correlation  near  Element  7,  it  drops 
drastically  to  very  low  correlations  near  the  ends.  The  elements  near  the 
circular  hole  are  in  a  plastic  state  and  the  stresses  in  these  elements  are 

near  the  yield  stress.  At  the  same  time  the  elements  far  from  the  hole  are 

elastic  and  the  stresses  vary  appreciably  with  changes  in  lr  <d.  The 
correlation  between  the  elastic  stress  and  the  plastic  stress,  which  changes 
very  little  with  load  ,  is  very  low  and  this  explains  the  low  stress 

correlation  near  the  hole.  The  low  stress  correlation  near  the  far  end  of  the 

x-axis  (Fig.  5b)  seems  due  to  the  low  variance  of  stress  there.  The  mean 
stress  and  the  variance  of  stress  along  the  x-axis  are  plotted  in  Figs.  5c  and 
5d,  respectively,  for  a  particular  load.  The  stress  variance  is  low  at  both 
ends  and  in  between,  near  the  hole,  it  peaks.  The  stress  in  this  region  is  in 
the  transition  state  from  elastic  to  plastic  and  so  the  stress  variance  is 
high. 

V.  CONCLUSIONS.  The  validity  of  PFEM,  for  uncertainties  as  large  as  10% 
(i.e.,  coefficient  of  variation  is  10%)  and  under  substantial  material 
nonlinearity,  has  been  demonstrated  in  the  previous  section.  Also,  the 
effectiveness  of  the  scheme  in  removing  secularities  from  the  transient 
statistics  is  brought  out.  Based  on  this  scheme,  extension  can  be  made  to 
remove  secularities  from  nonlinear  transient  statistics  as  well.  Also,  the 
PFEM  can  be  extended  to  handle  geometric  randomness.  Efforts  are  being  made 
to  achieve  these  two  goals. 

The  PFEM  and  related  procedures  (5-13)  have  been  applied  in  the  past  to 
study  the  effect  of  randomness  in  structural  dynamics,  linear  and  nonlinear 
response  of  continua,  and  buckling  and  collapse  analysis.  While  such  wide 
applications  of  PFEM  in  structural  mechanics  have  been  achieved,  from  the 
point  of  view  of  reliability  and  failure  analysis  '■he  statistical  aspects  of 
fracture  mechanics  assume  importance.  Numerical  methods,  such  as  PFEM,  for 
studying  these  aspects  are  very  scarce.  The  fracture  related  quantities  such 
as  fracture  toughness,  initial  and  ultimate  yield  stress,  the  number,  size  and 
orientation  of  the  cracks,  voids  and  inclusions  are  usually  hard  to  determine 
exactly.  These  and  other  quantities,  which  govern  the  crack  growth,  rate  of 
crack  growth,  the  direction  of  crack  propagation  and  the  eventual  failure  of 
the  structure,  can  be  modelled  as  random  material  or  geometric  quantities. 
Fracture  studies,  incorporating  such  randomness  in  PFEM,  could  give  an  insight 
on  the  fracture  statistics.  Based  on  the  experience  obtained  so  far,  such 
studies  using  PFEM,  seems  promising. 
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Problem  Statement  1:  Transmission  Tower  with  15  Nodes/32  Bars. 
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Problem  Statement  2:  Elastic-Plate  with  a  Circular  Hole. 
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Fig.  4b  Comparison  of  the  Variance  of  Displacement  at  Node  400 
PFEM  and  MCS. 
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Fig.  5a  Spatial  Correlation  of  Displacement  along  y-axis,  w.r.t.  the 

Displacement  at  Node  400,  by  PFEM  and  MCS. 
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Fig.  5d  Comparison  of  the  Variance  of  Stress  along  x-axis  at  load  step 

between  prrw  an'1  MCS. 


LIMIT  THEOREMS  FOR  THE  SIZE  EFFECT 
IN  THE  LIFETIME  DISTRIBUTION 
OF  A  FIBROUS  COMPOSITE 


S.  Leigh  Phoenix  and  Chia-Chyuan  Kuo* 

Sibley  School  of  Mechanical  and  Aerospace  Engineering 

Cornell  University 
Ithaca,  New  York  14853 

ABSTRACT.  A  composite  material  is  a  parallel  arrangement  of 
stiff  brittle  fibers  in  a  flexible  matrix.  Under  load  fibers  fail,  and 
the  loads  of  failed  fibers  are  locally  redistributed  onto  nearby 
survivors  through  the  matrix.  In  this  paper  we  develop  a  new 
technique  for  computing  the  probability  of  failure  under  a  previously 
studied  model  of  the  failure  process.  In  this  model,  known  as  the 
chain-of-bundles  model,  failure  occurs  when  all  fibers  fail  in  at 
least  one  bundle.  A  recursion  and  limit  theorem  are  obtained  which 
apply  separately  to  static  strength  and  fatigue  lifetime  depending  on 
the  composite  loading  and  the  probability  model  for  the  failure  of 
individual  fibers  under  their  own  loads.  The  limit  theorem  yields  an 
approximation  for  the  distribution  function  for  composite  lifetime 
which  is  of  the  form  1  -  [1  -  W(t)]mn  where  W(t)  is  a  characteristic 
distribution  function  and  mn  is  the  composite  volume,  reflecting  a 
size  effect.  A  similar  result  holds  also  for  static  strength.  In  both 
cases  such  a  result  was  conjectured  several  years  ago.  This  limit 
theorem  is  obtained  from  the  recursion  upon  applying  a  key  theorem 
in  the  theory  of  the  renewal  equation.  In  the  proofs  three  technical 
conditions  arise  which  must  be  verified  in  specific  applications.  In 
the  case  of  static  strength  these  conditions  are  quite  easy  to  verify, 
but  in  the  case  of  fatigue  lifetime  the  verification  is  generally 
difficult,  and  entails  considerable  numerical  computation. 


'  Present  Address:  Kendall  Company,  95  West  St.,  Walpole,  MA  02081 . 
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I.  INTRODUCTION.  In  this  paper  we  present  a  new  recursive 
technique  and  a  limit  theorem  for  an  earlier,  idealized  model  of  the 
failure  process  in  a  fibrous  composite  material.  For  previous  work 
on  static  strength  see  Harlow  and  Phoenix  (1978,  1981,  1982), 
Harlow  (1985),  and  Smith  (1980,  1982,  1983),  and  in  the  case  of 
time  dependent  fatigue  see  Tierney  (1982)  and  Phoenix  and  Tierney 
(1983).  The  present  paper  is  an  abbreviated  version  of  a 
forthcoming  paper  by  Kuo  and  Phoenix  (1987). 

To  review  the  model,  we  consider  a  simple  composite  which  is 
an  arrangement  of  n  parallel  filaments  along  a  line  to  form  a  planar 
tape,  or  in  a  circle  to  form  a  tube.  The  loading,  which  is  a  specified 
function  of  time,  is  simple  tension  in  the  fiber  direction.  The  actual 
failure  process  to  be  modeled  begins  when  fibers  fail  randomly  in 
both  time  and  position,  and  locally  their  original  loads  are 
transferred  to  adjacent  fibers  which  then  become  overloaded.  In 
time  some  of  these  overloaded  fibers  fail  too,  and  clusters  of 
several  contiguous  breaks  appear.  Eventually  one  of  these  clusters 
grows  to  an  unstable  size,  turns  into  a  catastrophic  crack  and  fails 
the  composite. 

To  model  this  failure  process  the  composite  is  partitioned  into 
a  series  of  m  short  sections  called  bundles,  each  containing  n  fibers 
elements  of  length  8,  the  effective  load  transfer  length.  The  failure 
process  is  localized  within  the  bundles,  and  the  composite  is  treated 
as  a  weakest-link  arrangement  of  its  m  bundles,  each  carrying  the 
externally  applied  load.  The  mn  fiber  elements  are  treated  as 
statistically  independent  entities  under  an  identical  prescribed  load 
history  on  each  (though  their  failure  times  within  a  bundle  will  be 
dependent  because  of  the  load  transfer  process  which  will  cause  the 
individual  fiber  load  histories  to  differ);  thus  the  bundles  are 
statistically  independent.  Throughout  we  speak  of  load  on  a  'force- 
per-fiber'  basis;  that  is,  the  load  is  the  total  external  force  on  the 
composite  divided  by  n.  Henceforth  our  modeling  will  be  in  terms  of 
the  fiber  elements,  and  for  brevity  in  the  notation  we  refer  to  these 
as  the  'fibers’. 

Load-sharing  rule  for  fibers.  If  the  bundle  load  is  C,  a  surviving 
fiber  carries  load  KrC  where  Kr  is  called  a  load  concentration  factor, 
and  r  is  the  number  of  consecutive  failed  fibers  immediately 
adjacent  to  this  survivor  (counting  on  both  sides).  Also  set  K0  ■  1. 
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As  in  most  previous  analyses,  the  first  load-sharing  rule  we 
consider  is 

(1.1)  Kr  -  1  +  r/2  ,  r- 0,1,2 . 

wherein  the  load  of  a  failed  fiber  is  redistributed  in  equal  portions 
onto  its  two  nearest  surviving  neighbors,  one  on  each  side.  In  linear 
bundles  which  may  have  fibers  failed  at  the  bundle  edge,  this  rule 
has  a  slight  deficiency  since  there  are  no  exterior  fibers  to  carry 
some  of  the  shifted  load.  To  avoid  these  difficulties  we  will  also 
consider  circular  bundles  which  have  no  such  edges.  Here  we  need  to 
take  care  in  the  situation  where  only  one  last  fiber  remains  since 
one  would  expect  that  fiber  to  carry  the  total  load  nt,  whereas  Kn_i  t 
=  (n+1)t/2.  Thus  we  will  consider  instead  K‘n.i  =  n. 

An  alternate  rule  is  based  on  elastic  calculations  in  a  planar 
lattice  by  Gotlib,  El'yashevich  and  Svetlov  (1973),  namely  Kr  = 

(1+r)1/2  ,  r  =  0,1,2 .  This  rule  more  accurately  models  the  fiber 

loads  once  r  becomes  large  and  reflects  results  from  fracture 
mechanics  where  the  stresses  at  the  crack  tip  grow  as  the  square 
root  of  the  crack  length. 

An  important  feature  exploited  in  our  later  analysis  is  that 
none  of  the  load  of  a  failed  fiber  is  redistributed  beyond  the  two 
flanking  nearest  survivors.  The  mechanical  analysis  of  Hedgepeth 
(1961)  shows  this  assumption  to  be  somewhat  oversimplified  (as  it 
would  be  for  the  alternate  rule),  but  the  results  of  Pitt  and  Phoenix 
(1983)  suggest  that  this  shortcoming  is  minor,  provided  that  most 
of  the  redistributed  load  appears  on  the  nearest  survivors. 

Load  histories.  We  let  t(t),  t  £  0  be  the  load  history  which  we 
apply  to  the  composite.  In  general  t(t)  can  be  any  positive  function 
of  t  £  0.  However,  in  the  setting  of  static  strength  we  work  with 
the  linear  load  t(t)  -  t,  since  in  this  case  the  failure  time  and  the 
load  at  failure  will  be  identical.  In  fatigue  lifetime  the  simplest 
model  is  t(t)  -  L,  t  £  0.  Note  that  the  actual  fiber  load  histories  will 
differ  from  C(t)  as  neighboring  fibers  fail. 

Distribution  functions  for  lifetime.  We  let  Hmin(t;t)  be  the 
distribution  function  for  the  failure  time  of  the  composite  under 
load  C(t),  t  £  0.  Also  let  Gn(t;t)  be  the  distribution  function  for  the 
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failure  time  of  a  single  bundle.  Since  the  individual  fibers,  and  thus 
the  bundles,  will  be  statistically  independent  entities  we  have  the 
simple  connection 

(1.2)  Hm  n(t;t)  *  1  -  [1  -  Gn(t;C)]m  ,  t*0. 

The  main  task  is  thus  to  calculate  Gn(t;C). 

Model  for  the  failure  of  fibers.  We  let  F(t;X),  t  £  0  be  the 
distribution  function  for  the  failure  time  of  a  single  fiber  under  its 
load  history  X(t),  t  £  0.  To  model  fiber  failure  it  is  convenient  to 
use  the  concept  of  a  standard  representative  fiber  as  introduced  by 
Tierney  (1982).  First  associate  with  the  fiber  a  random  variable  Z 
which  follows  the  unit  exponential  distribution 

(1 .3)  F(z)  =  1  -  exp{-z} ,  z  2  0  . 

Then  given  the  load  history  X(t),  t  z  0  on  the  fiber,  let  0(t;X)  be  the 
cumulative  hazard  function  (CHF)  for  failure,  and  assume  it  to  be  a 
non-anticipating  functional  of  X.  Also  we  assume  0 (t;X)  is 
increasing  and  right-continuous  in  t  2  0  for  fixed  X,  and  0(t;X)  is 
monotone  in  X]  that  is,  if  X^s)  £  X2(s)  for  all  0  £  s  s  t  then  eft;^)  > 
0(t;X2).  Then  under  X,  the  failure  time  T  of  the  fiber  is  the  smallest 
value  of  T2  0  for  which 

(1 .4)  0(T;X)  2  Z  . 

By  this  construction  we  have 

(1.5)  F(t;X)  -  1  -  exp{- 0(t;X))  ,  t  2:  0  . 

Under  a  common  fiber  load  history  X  the  lifetimes  of  the 
individual  fibers  are  assumed  to  be  statistically  independent;  that 
is,  the  Z’s  are  independent  from  fiber  to  fiber.  However,  in  a  bundle 
the  individual  fiber  load  histories  will  begin  to  differ  as  neighboring 
fibers  fail,  and  the  fiber  lifetimes  will  become  dependent.  This  is 
where  the  main  complication  arises. 

In  the  setting  of  static  strength,  a  fiber  has  random  strength  X 
which  is  independent  of  both  its  load  history,  and  the  strength  of 
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other  fibers.  We  assume  X  has  distribution  function  F0(x),  x  £  0, 
which  we  write  as 

(1.6)  F0(x)  =  1  -  exp{-  0o(x)},  x  s>  0 

for  some  suitable  increasing  function  ©0(x),  x  £  0;  of  course,  F0(x)  is 
common  to  all  fibers.  Then  in  this  case  the  CHF  becomes 

(1 .7)  0(t;X)  *  sup  ©o  (Ms))  • 

0  £  s  ^  t 

Some  specific  cases  of  this  model  are  as  follows:  The  first 
and  simplest  case  is  known  as  the  'pure  flaw'  model  which  has  been 
considered  by  Harlow  (1985).  In  this  case  a  fiber  is  assumed  to  have 
zero  strength  with  probability  <>  and  unit  strength  with  probability 
1-4>.  Thus 


1 

|° 

x  <  0  , 

(1.8)  e0(x)  -  < 

-  ln(1  -<>)  , 

0  <1  x  <  1  , 

1“ 

1  <,  x  . 

A  second  case 
strength: 

is  where  fibers 

follow  a  Weibull  distribution  for 

(1 .9)  Fq(x) ■  1  - exp(- x7) ,  x£0 

where  y  >  0  is  a  constant.  This  is  the  model  studied  by  Harlow  and 

Phoenix  (1978,  1981,  1982)  and  Smith  (1980,  1982,  1983)  among 
others.  Then  0o(x)  »  XT  x  £  0.  In  both  these  cases  the  lifetime  T 

and  the  strength  X  are  identical  under  k(t)  =  t,  and  the  same  will  be 
true  for  the  composite. 

In  the  case  of  time  dependent  fatigue  we  may  consider  the  CHF 
of  the  form 

t 

(1.10)  0(U)  =  (Jx(s)pds)p  ,  t£0  , 

o 

where  p  >  0  and  p  >  0  are  constants.  Various  versions  of  this  model 
have  been  studied  by  Tierney  (1982)  and  Kuo  (1983). 
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Failure  process  in  a  bundle.  For  a  bundle,  we  assign  the 
independent  random  variables  Z1f  ....  Zn,  one  to  each  fiber,  and 
assume  these  also  follow  (1.3).  Then  given  particular  realizations 
of  the  Zj's,  the  bundle  load  history  t,  and  the  load-sharing  K/s  we  can 

solve  explicitly  for  the  fiber  failure  times  denoted  T{1) . T(n)  and 

for  the  bundle  failure  time  Tn  =  max{T(1) . T(n)}.  (See  Phoenix  and 

Tierney  (1983)  for  example.)  Note  that  while  the  bundle  load  history 
is  £(t),  t>  0,  the  individual  fiber  load  histories  ^(t),  ...,Xn(t)  may 
involve  some  Krt(t)  as  neighbors  fail,  and  these  must  be  used  in  the 
analysis. 

Outline  of  the  paper.  The  key  quantity  to  consider  is  Qn(t)  ■  1  - 
Gn(t),  t  £  0,  the  probability  a  bundle  survives  to  time  t.  (Henceforth 

we  generally  suppress  t(t)  in  the  notation  unless  germane.)  In 

Section  2  we  develop  a  recursion  formula  (Theorem  1)  for  Qn(t)  for 
both  planar  and  circular  bundles.  In  Section  3  we  obtain  the  main 
limit  theorem,  Theorem  2.  In  Section  4  we  recast  this  theorem  into 
a  key  approximation  for  Gn(t)  and  Hm  n(t)  which  involves  two 
functions:  a  characteristic  distribution  function  W(t)  and  a  boundary 
function  7i(t).  The  form  of  the  approximation  is  thus  Hmn(t)  =  1  -  [1 
-  W(t)]mn  k (t) m .  Also  n{ t)  -  1  for  circular  bundles,  and  typically 
deviates  negligibly  from  one  for  planar  bundles.  This  theorem  and 
resulting  approximation  essentially  confirm  a  conjecture  first  posed 
by  Harlow  and  Phoenix(1978)  in  the  static  case  and  Tierney  (1982) 

in  the  time  dependent  case.  Harlow  (1985)  first  confirmed  the 

conjecture  in  the  simplest  case,  the  pure  flaw  model. 

To  use  Theorem  2  in  specific  applications,  three  technical 
conditions,  (3.3),  (3.4)  and  (3.5)  must  be  verified.  Roughly  speaking, 
these  conditions  involve  showing  that  at  time  t  the  probability  of 
having  a  lone  survivor  in  a  bundle  of  n  fibers  divided  by  the  survival 
probability  [1  -  W(t)]n  is  small  compared  to  one,  and  furthermore 
diminishes  very  rapidly  as  n  -»  •».  In  Section  5  we  verify  these 
conditions  for  the  static  cases,  giving  ranges  for  the  model 
parameters  under  which  the  results  hold.  However,  generally  we 
cannot  give  justification  in  the  time  dependent  model  without 
introducing  additional  conditions  which  are  physically  justifiable, 
but  seem  to  be  irrelevant  from  numerical  calculations. 
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II.  RECURSION  ANALYSIS.  For  the  most  part  the  dependence  of 
all  quantities  on  time  t  will  be  suppressed  in  the  notation.  We 
consider  only  values  of  t  for  which  F(t;t)  <  1  since  otherwise  the 
problem  is  trivial. 

Linear  bundles.  We  first  consider  linear  bundles  wherein  the 
fibers  are  arranged  along  a  line  from  left  to  right.  For  a  given  fiber 
we  let  the  symbols  'X'  and  'O'  denote  failure  and  survival, 
respectively.  A  bundle  of  n  fibers  clearly  has  2n  configurations  of 
failed  and  surviving  fibers.  For  example,  for  n  -  6  a  possible 
configuration  is  OXXOXO.  Failure  is  defined  as  the  configuration 
XXX...  X  and  we  let  An  be  the  set  of  all  2n  -  1  remaining  survival 
configurations,  that  is,  all  configurations  of  n  fibers  with  at  least 
one  ’O’.  Thus  we  formally  define  Gn  -  Pr{XXX...X}  and  Qn  s  Pr{An}  =  1  - 
Gn. 


Next  we  let  Ej  be  the  configuration  of  i  fibers  which  has  all  'X's 
except  for  an  'O'  at  the  very  left,  that  is,  By  *  {O},  E2  «  {OX},  E3  = 
{OXX},  and  so  on.  In  the  analysis  to  follow  we  decompose  An  into  the 
disjoint  subsets  An1,  An2,  ....  Ann  where  Ani  contains  all  elements 
of  An  whose  right-most  i  fibers  are  in  the  configuration  Ej.  Thus 

(2-D  An  =  XAn.i 

and  defining  Qn  j  «  Pr{An  j}  we  have 

n 

(2.2)  On-Za„tl.  nil. 

1-1 

For  two  sets  of  survival  configurations  A  and  B  we  define  the 
new  set  A*B  through  the  operation  as  the  set  of  all 

configurations  generated  by  attaching  a  configuration  from  B  to  the 
right  end  of  one  from  A.  By  inspection,  we  have  the  general 
recursive  relationships 

(2.3)  An.,  ,  -A„*{0),  nil 
and 

(2.4)  An+1>i  «AniM*{X},  2  £  i  £  n+1  , 
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starting  with  A1(1  «  A1  =  {O}.  We  will  also  use  the  operation  '*'  to 
join  two  configurations,  rather  than  sets. 

Decomposable _ configurations  for  linear  bundles.  A 

configuration  Y  in  An  is  said  to  be  decomposable  if  there  exist  two 
adjacent  survivors  or  'O's  in  Y.  For  example,  Y  =  {OXOOX}  is 
decomposable  whereas  Y'  =  {OXOX}  is  not.  If  Y  is  decomposable  then 
clearly  Y  -  *  Y2  for  some  Y1  in  Ar  and  Y2  in  An.r  where  1  £  r  <  n-1 

and  where  Y ,  has  a  survivor  at  its  right  end  while  Y2  has  a  survivor 
at  its  left  end.  The  importance  of  this  concept  is  that  Pr{Y}  =  PrjY,} 

P r { Y2 }■  (To  see  this  one  must  use  the  concept  of  standard 

representative  fibers  as  described  in  Section  1.)  In  other  words,  the 
probability  of  a  decomposable  configuration  occurring  at  time  t  is 
the  product  of  the  probabilities  for  the  component  configurations 
viewed  as  smaller  distinct  bundles. 

For  certain  configurations  which  cannot  be  decomposed,  we 
will  later  need  bounding  probabilities  written  in  terms  of 
probabilities  for  smaller  configurations,  as  in  the  following  lemma. 

Lemma  1.  Let  Y,  e  An  and  Y2  e  Am  such  that  either  Y^  has  an  'O’  at  its 
right  end,  or  Y2  has  an  'O'  at  its  left  end.  Then  for  Y  «  Y1  *  Y2  e  An+m 
we  have 

(2.5)  Pr{Y}  <;  Pr{Yt}  Pr{Y2} . 

Proof:  To  prove  this  lemma  think  in  terms  of  the  standard 

representative  fibers  of  Section  1  where  Z1 . Zn  and  Zn+1 . Zn+m 

are  associated  with  the  fibers  which  may  yield  the  respective 
configurations  Y1  and  Y2,  and  altogether  the  configuration  Y.  First,  if 

Zt . Zn+m  have  values  such  that  Y  e  An+m  results  at  time  t  where 

fiber  n  is  surviving  while  fiber  n+1  is  failed,  then  these  same  values 
will  automatically  produce  Yt  e  An  and  Y2  e  Am.  The  reverse, 
however,  is  not  true  in  that  some  survival  configuration  other  than  Y 
may  result  for  the  bundle  of  size  m+n.  Hence  Pr{Y}  <  Pr^}  Pr{Y2}, 
proving  the  lemma. 

Special _ non-decomposable  configurations.  Two  sets  are 

crucial  to  the  analysis:  The  first  is  Fn  !  which  is  the  subset  of  An  j 
whose  elements  have  an  'O'  on  the  left  and  are  noi  decomposable. 
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The  second  is  Rn  j  which  is  the  subset  of  An  i  whose  elements  have  an 
'X'  on  the  left  and  are  also  Qfll  decomposable.  Define  fn  i  s  Pr{Fn  i}  and 
rn  i=  Pr{Rn  i}  and  also  set  r01  s  i,  rn  n  s  0  and  otherwise  rj  j  s  f j  s  =  0 
for  all  j  <  i. 

By  inspection  we  see  that 

(2-6)  uF  n*2, 

I  —  c 

where  the  symbol  M  above  a  set  means  that  each  of  its 
configurations  has  its  entries  written  down  in  reverse  order.  For 
example,  if  B  «  {XO.XOXXO}  thenE*  {OX.OXXOX}.  (We  also  apply  to 
a  single  configuration  with  the  same  meaning.)  Furthermore,  by 
studying  survival  configurations  for  small  n  we  see  the  structure 


(2.7) 


=  {0} 


F 

F 


2.1 


=  {0} 


n.1  “ 


n  - 1 
U 
i-2 


n£  3 


where  we  assume  0  *Y-Y*0-0for  any  Y  in  An.  It  is  also  true 
that 

(2-8)  Fn+1>1  =  {0}  *  Rn>1 


and 

(2.9) 


Rn+1,1 


.n-1- 
(  VJ  R  , 

i-2  0,1 


u(XX  ...  X})  *  {0}  . 


We  now  obtain  some  key  recursions. 


Lemma  2.  For  linear  bundles  we  have  the  recursion 


(2.10) 


tj 

®n,i  =  S  ^n-j,1 
i-i 


t  j  +  r . 
j.i  n.i 


1  £i  ^  n  , 


starting  with  Q0ii  =  1  and  Q0ij  b  0  for  i  £  2. 
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Proof:  It  suffices  to  prove  the  case  i  ■  1.  If  n  ■  1  the  result  is  obvious.  Next  take 
n  >  2,  and  suppose  Y  e  An1  but  Y  is  not  decomposable.  Then  either  Y  e  Fn1  or 
Y  e  Rn>1,  accounting  for  j  *  n  in  the  sum.  On  the  other  hand,  if  Y  e  An (1  *  (Fn (1  u 
Rn j),  then  Y  is  decomposable  and  there  exists  some  |  such  that  Y  €  An.j,1  *  Fj f1 
and  1  £  j  ^  n-1.  The  sum  follows  from  disjointedness  of  the  various 
configurations. 

Circular  bundles.  In  the  case  of  circular  bundles,  the  previous  analysis 
must  be  modified.  Again  we  label  the  fibers  consecutively  from  1  to  n  starting 
arbitrarily,  but  fibers  1  and  n  are  now  adjacent.  We  still  write  out  a  configuration 
in  a  linear  fashion,  though  with  this  adjacency  of  the  first  and  last  fibers 
understood.  This  latter  aspect  forces  a  modification  of  the  earlier  concept  of  a 
decomposable  configuration.  To  be  decomposable  a  configuration  Y  for  a 
circular  bundle  must  now  have  either  three  adjacent  survivors,  or,  two  or  more 
pairs  of  adjacent  survivors. 

We  define  An  and  Qn  as  before  but  for  n  k  2  let  An  0  be  the  subset  of  An 
whose  elements  have  two  or  more  adjacent  survivors.  Furthermore,  for  n  £  3 
we  partition  An  0  into  two  subsets  An  01  and  An  02  where  An01  contains  exactly 
those  elements  of  An  0  which  are  decomposable,  and  An  02  contains  the  rest. 
Lastly  we  let  Q10  =  Pr{0},  Q2  0  =  Pr{00}  and  Qn0  *  Pr{An  0j,  n  £  3. 


Lemma  3.  For  circular  bundles  we  have  the  recursion 


(2.11) 


Xq„, 

j«i 


i.o  fj,i 


n£  1  , 


where  Q0  0  s  0  and  fni1  is  as  defined  for  linear  bundles. 


Proof:  See  Kuo  and  Phoenix  (1987). 

Theorem  1.  For  both  linear  and  circular  bundles  we  have  the  recursion 

n 

(2.i2)  a„  =  Xo„.| "si, 

j-1 

starting  with  Q0  m  1 ,  where  for  linear  bundles 

n  n  n  n 

(2  13)  =  X  r„.m.1  rm.1  +  X  r„,i  -  X  X  W.1  rmj 

m-0  j-2  m-1  j-2 

and  for  circular  bundles 
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n 

(2.14)  ^  -  n  f„  ,  +  ft  ■  X 

i-i 


r(0)  f 
rn-j.1  i.1 


where  is  analogous  to  rn  1  for  linear  bundles  (and  must  include  all  unique 

rotations).  The  proof  of  this  theorem  is  long  and  will  appear  in  Kuo  and 
Phoenix  (1987). 


III.  BEHAVIOR  OF  Qn  AS  n  GROWS  LARGE.  We  show  here  that  for  both 
linear  and  circular  bundles  Qn  has  the  structure  Qn  =  x'n(n  +  on)  for  suitable 
functions  x.  n  and  on  where  on  -»  0  as  n  Also,  x  is  the  same  for  both 

bundles,  and  n,  which  apparently  reflects  edge  effects,  is  identically  one 
in  the  circular  case.  We  begin  with  some  key  lemmas,  definitions  and 
assumptions. 


Lemma  4.  For  linear  bundles 
(3.1)  £f„.,S1. 

n-0 


Proof:  Recall  (2.20)  and  take  i  -  1.  Since  r0  =  1  we  have  R,(s)  >  1 
whence  F^s)  <1.  In  view  of  (2.19)  Abel's  lemma  (Karlin  and  Taylor 
(1975))  gives  us  (3.1). 

Next  let  x  be  the  solution  to 

<3'2>  £  <„.1  =  1 

n  - 1 

and  note  that  x  ^  1  by  Lemma  1,  since  f01  -  0. 

Technical  conditions.  We  now  make  some  technical  assumptions 
needed  later.  For  a  linear  bundle  of  n  fibers  we  recall  En  was  the 
configuration  {OXX  ...  X}.  Let  en  3  Pr{En}  and  assume  two  conditions  are 
satisfied,  namely 

(3.3)  £  en  x"  <  1 

n-2 

and 
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(3.4)  X  nonXn<00- 

n- 1 

We  also  let  hn  be  the  probability  that  only  one  fiber  survives  in  a  linear 
bundle  of  n  fibers,  and  assume 

(3.5)  £  hnx"<~. 

n  - 1 

(By  the  principle  of  monotone  convergence  these  sums  will  indeed 
converge.)  Lastly  we  assume  t  is  such  that  x  is  finite.  This  is  guaranteed 
if  F(t;t)  <  1 . 

We  now  give  several  lemmas  whose  proofs  appear  in  Kuo  and  Phoenix 
(1987). 

Lemma  5.  For  linear  bundles 

(3.6)  £  n  f  x"  < 00  ■ 

n- 1 

Lemma  6.  For  linear  bundles 

(3.7)  R  t  (x)  <  - . 

Lemma  7.  For  linear  bundles 

(3.8)  R(x)<~. 

Lemma  8.  For  circular  bundles 

(3.9)  R1,01  (x)  <  ~ . 

Lemma  9.  For  both  circular  and  linear  bundles  the  sequence 
{QnXn}n-0  iS  b0Unded 

Lemma  10.  For  both  linear  and  circular  bundles 

(3.10)  £iyx"<~- 

n-1 

Thus  we  may  state  the  Key  result. 
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Theorem  2.  For  both  linear  and  circular  bundles 


(3.11)  limQnxn  =  * 

n  — *>• 

where 


(3.12)  k  = 


R,  ir.fi 
1 


£n,n.1*" 

n-1 


for  linear  bundles  , 
for  circular  bundles . 


Proof:  From  Theorem  1  we  may  write  the  renewal  equation 


n 

(3.13)  Onx"  =  £  Q„.lx"'if|.,Xi^nx" 
i«i 


where  £n  is  given  respectively  by  (2.13)  and  (2.14)  for  linear  and 
circular  bundles.  To  (3.13)  we  may  apply  a  key  theorem  in  the  theory 
of  the  renewal  equation  as  given  in  Karlin  and  Taylor  (1975).  The 
key  conditions  for  this  theorem  are  (3.2),  Lemma  5,  Lemma  9,  Lemma 
10,  and 


(3.14)  gcd{n|fn1>0}  =  1  , 

which  is  obvious.  The  theorem  yields  (3.11)  where 

(3.15)  *  =  £  5nx"/£n*n1x". 

n-0  n-1 

In  the  linear  case  *  ro,i  and  (2.13)  yields 

(3-16)  £  x"  =  R,  (X)2  +  £  R|(X)  -  F,(x)  £  Rj  (X)  =  R,  lx)' 


n-0 


1-2 


i-2 


in  view  of  (3.2).  In  the  circular  case  £0  * r^j  «  i,  and  (2.14)  similarly  yields 

(3-17)  £  U"  «  £  nf„j  x" . 

n-0  n-1 

Thus  (3.12)  follows  from  (3.15)  to  (3.17),  proving  the  theorem. 
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IV.  BEHAVIOR  OF  H-.m  AS  m  AND  n  GROW  LARGE.  We  now 
recast  the  results  of  Theorem  2  into  a  more  useful  form  from  the 
point  of  view  of  applications.  Let 

(4.1)  W(t)  =  1  -  1/x(t)  ,  t*0. 

Then  since  Gn(t)  -  1  -  Qn(t)  we  may  recast  Theorem  2  as 

(4.2)  Gn(t)  =  1  - [1  - W(t)]n (7c(t)  +  on(t)] ,  t^O, 

where  on(t)  -*  0  and  n  -»  ~  for  each  t  £  0.  From  (1.2)  the  distribution 
function  for  the  failure  time  of  the  composite  is 

(4.3)  Hm  n(t)  = 1  - 11  -  W(t)]mn [ic(t)  +  on(t)]m  ,  t;>0. 

Shortly  we  show  that  W(t)  is  typically  a  proper  distribution 
function  in  t  >  0.  Also  n(\),  which  is  identically  one  for  circular 
bundles  (Theorem  2)  is  typically  very  close  to  one  for  linear  bundles 
and  usually  tc(0)  «  1.  It  appears  that  n( t)  plays  the  role  of  a  bundle 
edge  term,  and  may  be  neglected  for  larger  n.  Thus  when  m  and  n  are 
both  large  and  of  the  same  order,  the  resulting  approximation  is 

(4.4)  Hm n(t)  *  1  -  [1  -  W(t)]mn ,  t*0. 

Of  course,  the  accuracy  of  this  approximation  depends  on  the  speed 
with  which  on(t)  -»  0.  Limited  numerical  studies  show  that  once  n 
reaches  a  moderate  size,  on(t)  decreases  by  orders  of  magnitude 
with  each  unit  increase  in  n,  so  that  the  convergence  is  extremely 
fast. 


Because  of  its  importance  we  call  W(t),  t  £  0  the 
characteristic  distribution  function  for  failure.  To  see  that  it  is 
indeed  a  distribution  ‘unction  we  note  that  for  circular  bundles 

W(t)  =  1  -  [Qn(t)]1'n  [1  +  On(t)]-1'n  . 

Since  Qn(t)  is  nondecreasing  in  t  and  on(t)  ->  0  as  n  -» «>  for  each  t  £ 
0,  it  is  easy  to  argue  that  W(t)  must  be  nondecreasing.  We  recall  x(t) 
>  1  and  from  the  definition  (3.2)  of  x(t)  we  have  fi,i(t)x(t)  £  1.  Since 
fi  ,i  (t)  -  Pr{0}  =  1  -  F(t;C)  we  use  (4.1)  to  obtain 

(4.5)  0  <  W(t)  <  F(t;C),  1 2>  0  . 
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Now  as  t  -»  oo  we  have  F(t;t)  ->  1  but  it  is  more  difficult  to  argue 
that  W(t)  -¥  1  since  this  requires  x(t)  ->  00  and  this  is  not  easily 
seen  from  the  definition  (3.2)  of  x(t).  However,  a  lower  bound  W*(t) 
on  W(t)  can  be  obtained  in  many  cases  which  satisfies  W*(t)  las  t 
->  00.  Multiplying  (3.7)  by  xn  and  summing  on  n  leads  to 

(4.6)  e,  xj  £  1  • 
l-i 

Unfortunately,  simple  expressions  for  ej(t)  are  not  usually  possible, 
but  in  applications  one  can  usually  show  that 

(4.7)  6j (t) <, A B(t)i ,  ts>0, 

where  A  is  a  positive  constant  and  B(t)  is  some  positive  function  satisfying  B(t) 
-4  0  as  t  -M».  Then  x*(t)  which  solves 

(4.8)  A  £  (x*(t)  B(t))1  =  1 

i-i 

will  be  a  lower  bound  on  x(t).  Since  x*(t)B(t)  must  be  a  constant  in 

(4.8)  we  will  have  x*(t)  -> 00  and  W(t)  1  as  t  «>.  Loosely  speaking 
condition  (4.8)  will  tend  to  be  satisfied  when  the  survival 
probability  for  a  single  fiber  diminishes  sharply  to  zero  with 
increasing  Kj.  (Recall  6j  -  Pr{OXX...X}.)  This  will  depend  on  both  the 
upper  tail  behavior  of  F(t;X)  and  how  fast  Kr  grows  in  r. 


Numerical  Calculation  of  W(t).  The  exact  calculation  of  W 
requires  the  calculation  of  x  using  (3.2).  We  let  be  the  r  x  r 
matrix 


(4.9) 


fir 


0  1 

0  0  1 

0  0  0...  1 

f r, 1  fr-1,1  •••  *2,1  *1,1 


where  we  recall 


*i,i  -  Pr{0) 

*2,1  ■  0 
f3,i  -  Pr{OXO} 
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(4.10) 


f4,i  =  Pr{OXXO} 

f5i1  =  Pr{OXOXO,  OXXXO} 

f6(1  =  Pr{OXXXXO,  OXXOXO,  OXOXXO} 


these  being  dependent  on  t.  Then  1  -  W  -  1/x  is  the  spectral  radius 
of  the  infinite  matrix 

(4.11)  £  =  lim  B  . 

OO  # 

r— >oc 

To  calculate  1/x  numerically,  first  calculate  in  succession  1/x i  , 
1  /%2  .  •••  as  the  largest  eigenvalues  of  the  respective  matrices  £.1( 

£2 .  This  requires  being  able  to  calculate  probabilities  for  the 

configurations  in  (4.10),  and  this  is  usually  possible  for 
configurations  up  to  length  12  or  so.  Since  xr-»  X  as  r  -» «*»  choose  xr 
where  r  is  large  enough  for  the  convergence  to  be  essentially 
complete.  In  this  regard  note  that  Hm  n(t)  «  mnW(t)  according  to 
(4.4),  where  mn  is  typically  very  large  (say  109).  Thus  "essentially 
complete"  means  that  changes  in  mn(xr-1)/Xr  must  be  small 
compared  to  one.  In  any  case  Xr  ^  X  so  that  Wr  s  1  -  l/xr  will  be  an 
upper  bound  on  W.  In  applications,  r  of  the  order  of  10  often 
suffices. 

Behavior  of  rc(t).  As  mentioned,  rc(t)  appears  to  play  the  role  of 
a  boundary  or  edge  term,  and  is  identically  one  for  circular  bundles. 
For  linear  bundles  it  may  be  shown  that  n(0)  ■  1  when  t(0)  -  0  even 
when  F(0;C)  «  <>  >  0  as  in  the  pure  flaw  model.  This  is  shown  in  Kuo 
and  Phoenix  (1987). 


^ _ APPLICATIONS  AND  VERIFICATION  OF  TECHNICAL 

ASSUMPTIONS.  To  apply  the  previous  results  in  specific  cases,  we 
must  verify  the  key  technical  assumptions  (3.3)  to  (3.5).  Here  we  do 
this  for  the  'pure  flaw'  model  for  fibers  to  illustrate  some  useful 
procedures  and  difficulties. 

'Pure  flaw'  model  for  fibers.  We  recall  the  simple  model  (1.8) 
where  a  fiber  has  unit  strength  with  probability  1  -<>  or  has  zero 
strength  with  probability  $.  The  composite  loading  we  recall  is  C(t) 
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=  t,  t  £  0.  Before  beginning,  we  mention  that  Harlow  (1985)  used  a 
very  different  recursive  approach  to  study  the  planar  case  of  the 
model.  He  arrived  at  essentially  the  same  structure  (4.3)  for  Hmn(t) 
though  it  is  difficult  to  demonstrate  that  all  his  quantities  are 
equivalent  to  ours. 

Considering  t  -  0  first,  we  are  able  to  evaluate  all  the  major 
quantities.  First  we  show  W(0)  -  0  even  though  F(0,t)  ■  $  >  0.  When  t 
-  0,  it  may  be  shown  that  (4.6)  is 

(5-1)  £e.x'-1.  (t-0). 

i-i 

Since  6j  *=  Pr{Ej}  -  <t>H(1  -4>)  we  may  evaluate  the  sum  in  (5.1)  to 
obtain 

(5.2)  (1-4>)4»x/(<K1-Ox))  -  1  • 

This  yields  x  -  1  so  that  W(0)  -  0.  At  the  end  of  Section  4  we 
pointed  out  that  n[ 0)  -  1  for  both  circular  and  linear  bundles. 
Turning  to  the  three  conditions  (3.3)  to  (3.5)  we  first  note  that  hn  = 
nen  so  that  the  third  is  equivalent  to  the  second.  Since  en  -  <|>n-1(1-<|>) 
and  x  -  1  we  have 

®n =  <t> <  1  and  £  nenxn  =  1/(H)<~. 

n -2  n-1 

Finally,  it  is  easy  to  see  that  Gn(0)  -  <)>n  so  from  (4.2)  the  residue 
term  is  on(0)  -  -  <>n. 

Next  we  consider  t  such  that  0  <  t  <  1,  and  we  choose  k  such 
that  Kk_!  t  <  1  <  Kkt.  The  interpretation  of  k  is  that  under  the 
composite  load  t  an  intact  fiber  will  fail  once  it  develops  k  failed 
neighbors  (counting  on  both  sides).  To  verify  the  three  conditions 

(3.3)  to  (3.5)  it  is  easiest  to  use  a  simple  upper  bound  on  x(t). 
namely  1/(1-F(t;t))  -  1/(1  -<>).  Also  en  -  <}>n-1(1-<J>)  for  1  £  n  <  k  and  is 
zero  otherwise.  Thus  for  the  first  condition  (3.3), 

(5.3)  £  en  xn  s  X  4>n(H)(H>)'(n*1)  *W-2« . 

n -2  n-1 
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which  is  less  than  one  provided  <>  <  1/3.  For  the  second  and  third 
conditions  (again  hn  «  nen) 

(5.4)  Xne„  x"  £  nW/fHO]"  -  (1-<|>)2/(1-2<t.)2 , 

n-1  <P  n- 1 

which  is  finite  for  <|>  <  1/2.  Thus  all  conditions  are  met  inde¬ 
pendently  of  t  for  <>  <  1/3.  The  case  t  -  1  is  trivial  since  Gn(1)  -  1. 

Turning  to  the  calculation  of  W(t),  the  simplest  situation  is 
when  1/1^  £  t  <  1.  Studying  (4.10)  we  get  f1t1  -  (1-<»  and  fn1  -  0  for 
n  >  2.  Thus  (3.2)  yields  x  -  1/(1-<t>)  so  W(t)  -  0.  The  next  simplest 
situation  is  when  1/K2  £  t  <  1/Kt  so  that  (4.10)  yields  f1(1  =  (1  -<>) , 
f2,i  =  0,  f3i1  =  <j>(1  -<t>)2  and  fn1  -  0  for  n  £  4.  Thus  (3.2)  yields 

(5.5)  (1-0)x  +  4>(1-0)2  X3  -  1  - 

While  we  could  solve  for  x  explicitly,  we  are  usually  interested  in 
small  values  of  <t>  in  applications.  We  find  x  -  1/(1  -2<|>2)  +  0(<J>3) 
whence  W(t)  «  2<j>2  +  0(<|>3).  The  next  easiest  case  is  1/K3  £  t  <  1/K2. 
Studying  (4.10)  we  find  the  new  fn1  's  are  f4|1  «  4>2(  1  -<t>)2  but 
otherwise  fn1  «  0  for  n  even  and  fni1  ■  <|>("-i)/2(i-<|>)(n+i)/2  for  n  0dd. 
The  series  (3.2)  may  be  evaluated  to  yield 

(5.6)  (1  -«>)x  +  4>(1-<l>)x2  +  $2(1  -<I>)2X4  -  4>30  -4>)3  X6  -  1  . 

where  in  the  process  we  find  x  <  [<t>(  1  -<>)]-1/2-  .  Then  x  is  the  real 
solution  to  (5.6),  and  must  be  determined  numerically.  For  small  <> 
we  find  x  *  1/(1  -  3<t>3 )  +  0«>4)  whence  W(t)  -  3<|>3  +  0(<>4). 

For  smaller  t  and  k  the  fn<1  in  (4.10)  become  more  complicated. 
However,  it  appears  to  be  generally  true  that 

(5.7)  W(t)  =  k<t>k  +  0(<|>k+1)  ,  1/Kk  £  t  <  1/Kk.i 

for  k  =  1,2,...  .  Thus  we  see  that  W(0)  -  0  and  W(t)  increases  in  steps 
at  the  time  points  tk  ■  1/Kk  ,  k  -  1,2,...  where  the  number  of  steps 
becomes  infinite  as  t  i  0.  The  above  results  agree  with  those  of 
Harlow  (1985),  who  points  out  that  W(t)  £  k<|>k  at  least  for  1  £  k  £  5. 
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Turning  to  n( t)  for  the  case  of  linear  bundles,  we  find  from 
(3.12)  that  71  (t)  -  1  for  1/f^  <,  t  <  1  and  7i(t)  -  1  +  3<|>2  +  0(<|>3)  for  1/K2 
<,  t  <1/  Ky.  In  general  it  appears  that 

(5.8)  7t (t)  -  1  +  (k-1  )(k+1  )4>k  +  0(<}>k+i)  ,  1/Kk  S  t  <  1/Kk.-,  . 

Lastly  we  note  that  Harlow  (1985)  numerically  calculated  the 
maximum  deviation 

<5-9)  W  I  -  {1  -  (1  -  W(t)]m" )  | 

for  linear  bundles  and  various  combinations  of  m,  n  and  in  order  to 

study  the  error  in  the  approximation  (4.4).  First  his  results  suggest 
that  (5.7)  is  an  extremely  accurate  approximation  for  W(t)  for  <>  £  0.1 

and  k  up  to  12,  which  is  as  far  as  his  results  go.  Second,  almost  all 

the  deviation  he  observed  in  (5.9)  can  to  be  accounted  for  by  using 
the  approximation  (5.8)  for  7t(t)  instead  of  putting  ix(t)  =  1.  In  other 
words,  the  boundary  effects  in  the  planar  composites,  though  small, 
seem  to  dominate  the  residue  on(t). 

For  fibers  with  Weibull  strength  (see  (1.9)),  the  calculation  of 
W(t)  and  7i (t)  must  be  done  numerically  and  will  not  be  considered 
here.  Insight  into  their  behavior  can  be  obtained  from  Harlow  and 
Phoenix  (1981,  1982)  where  a  different  recursive  approach  was  used 
to  study  the  first  occurrence  of  k  adjacent  breaks.  In  fact,  the 
results  here  essentially  verify  a  conjecture  which  arose  there. 

For  the  time-dependent  fatigue  model  (1.10), verification  of 
conditions  (3.3)  to  (3.5)  has  proven  to  be  elusive  except  for  p  =  p  =  1 . 

A  practical  solution  is  to  restrict  the  load  on  a  fiber  to  Cmax, 
that  is,  to  take  F(t;C)  =  1  as  soon  as  C(t)  on  a  fiber  exceeds  tmax.  In 
practice  Cmax  would  be  the  theoretical  atomic  bond  strength  for  the 
material.  With  this  limitation,  if  k  is  chosen  such  that  K^L  <lmax<, 
KkL  then  the  sums  in  conditions  (3.3)  to  (3.5)  need  only  be  considered 
for  n  up  to  k.  Numerical  calculations  can  be  carried  out  for  k  up  to 
about  10,  and  for  Kk  -  1  +  k/2  this  means  for  L  >  Cmtx/6.  This  happens 
to  be  sufficient  for  many  applications.  Numerical  results  suggest 
that  the  conditions  hold  for  pp  >  3,  and  in  fact,  Cmax,  if  sufficiently 
large,  seems  to  have  little  to  do  with  the  convergence. 
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ABSTRACT.  This  project  focuses  on  the  development  of  new,  multldlmen- 
slonal  algorithms  for  direct  acoustic  propagation  and  generalized  acoustic 
tomography  at  the  level  of  the  scalar  Helmholtz  equation.  The  general  aim 
Is  the  continued  detailed  development  of  the  Ideas  originally  outlined  sev¬ 
eral  years  ago.  Phase  space,  or  "microscopic,"  methods  and  path  (function¬ 
al)  Integral  representations  provide  the  appropriate  framework  to  extend 
homogeneous  Fourier  methods  to  Inhomogeneous  environments.  The  path 
Integrals  furnish  the  principal  representation  of  the  Helmholtz  propagator 
and,  subsequently,  through  direct  computation,  the  basis  for  the  direct 
numerical  algorithms.  There  are  two  compl ementary  approaches  to  the 
analysis  and  computation  of  the  n-dlmenslonal  Helmholtz  propagator.  The 
first  Is  essentially  a  factorlzatl on/parabolic-based  (one-way)  phase  space 
path  Integration/Invariant  Imbedding  approach.  This  results  in  a  marchir.r 
algorithm  which  generalizes  the  Tappert/Hardln  split- step  FFT  algorithm  for 
one-way  wave  propagation,  a  nonperturbatlve  Incorporation  of  backscatter 
effects  which  generalizes  Kennett's  algorithm  In  reflection  seismology  for 
two-way  wave  propagation,  and  the  basis  for  the  formulation  and  solution  of 
corresponding  arbitrary-dimensional  nonlinear  Inverse  problems.  The 
numerical  algorithms  based  on  these  modern,  "microscopic”  methods  directly 
compute  pseudo-differential  and  Fourier  Integral  operators,  incorporate 
phase  space  filtering,  and  are  Ideally  suited  for  computers  which  provide 
either  a  vector  or  a  parallel  pipe  type  of  operation.  Extensive  testing 
has,  so  far,  been  very  promising.  While  the  first  approach  starts  from  a 
transversely  Inhomogeneous  formulation  and,  subsequently,  builds  in 
backscatter  effects,  the  second  approach  constructs  elliptic-based 
(two-way)  path  Integral  representations  of  the  propagator  for  general 
range- dependent  environments  from  the  outset.  A  particular  approximate 
path  Integral  construction  (Feynman/Garrod)  results  In  a  true  path 
functional,  suggesting  the  underlying  stochastic  foundations  of  the 
Helmholtz  equation.  It  appears  to  be  a  viable  computational  approximation 
for  a  useful  range  of  propagation  experiments  and  can  be  numerically 
evaluated  by  standard  Monte  Carlo  (statistical)  methods.  A  more  detailed 
examination  and  approximate  construction  of  the  underlying  stochastic 
process  would  provide  for  both  more  accurate  and  widely  applicable  path 
Integral  representations  and  direct  numerical  simulation  techniques. 

I.  INTRODUCTION.  Direct  wave  propagation  modeling  plays  a 
significant  role  in  such  fields  as  underwater  communication,  radio 
transmission  through  the  atmosphere,  laser  propagation,  and  earthquake 
prediction.  Likewise,  the  corresponding  Inverse  problems  are  at  the  heart 
of  such  areas  as  submarine  detection,  CAT  scan  technology,  soft-tissue 
diffraction  tomography,  the  mapping  of  the  Interior  earth,  and  oil 
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exploration.  In  all  of  these  and  mar\y  other  examples,  relatively  fast  and 
accurate  numerical  algorithms  are  necessary. 

The  analysis  and  fast,  accurate  numerical  computation  of  the  wave 
equations  of  classical  physics  are  often  quite  difficult  fcr  rapidly 
changing,  multidimensional  environments  extending  over  many  wavelengths. 

For  the  most  part,  classical,  "macroscopic"  methods  have  resulted  In  direct 
wave  field  approximations  (perturbation  theory,  ray-theory  asymptotics, 
modal  analysis,  hybrid  ray-mode  methods),  derivations  of  approximate  wave 
equations  (scaling  analysis,  field  splitting  techniques,  formal  operator 
expansions),  and  discrete  numerical  approximations  (finite  differences, 
finite  elements,  spectral  methods).  In  the  last  several  decades,  however, 
mathematicians  studying  linear  partial  differential  equations  have 
developed.  In  the  language  of  physicists,  a  sophisticated,  "microscopic" 
phase  space  analysis.  In  conjunction  with  the  alobal  functional  Integral 
techniques  pioneered  by  Wiener  (Brownian  motion)  and  Feynman  (quantum 
mechanics),  and  so  successfully  applied  today  In  quantum  field  theory  and 
statistical  physics,  the  n-dlmenslonal  classical  physics  propagators  can  be 
both  represented  explicitly  and  computed  directly.  The  phase  space,  or 
"microscopic,"  methods  and  path  (functional)  Integral  representations 
provide  the  appropriate  framework  to  extend  homogeneous  Fourier  methods  to 
Inhomogeneous  environments.  In  addition  to  suggesting  the  basis  for  the 
formulation  and  solution  of  corresponding  arbitrary-dimensional  nonlinear 
Inverse  problems.  Moreover,  It  Is  In  phase  space,  rather  than  In 
configuration  space,  that,  from  a  mathematical  perspective,  the  Interesting 
geometry  takes  place. 

II.  PHASE  SPACE  AND  PATH  INTEGRAL  CONSTRUCTIONS.  For  the 
n-dlmenslonal  scalar  Helmholtz  equation,  there  are  two  complementary  ap¬ 
proaches  to  this  analysis  and  computation,  as  Illustrated  In  Figure  1.  The 
first  Is  essentially  a  factorization/path  Integration/Invariant  Imbedding 
approach.  For  transversely  Inhomogeneous  environments.  Implying  medium 
homogeneity  with  respect  to  a  single  distinguished  direction,  the  n- 
dlmenslonal  Helmholtz  equation  can  be  exactly  factored  Into  separate, 
physical  forward  and  backward,  one-way  wave  equations,  following  from 
spectral  analysis  [1-5].  The  forward  evolution  (one-way)  equation 

(1/lt)ax  ++<x.xt>  +  (K2(xt)  ♦  <l/l(2)7t2)1/2**(x.xt)  -  0  ,  (1) 

where  K(x)  Is  the  refractive  Index  field  and  Ic  Is  a  reference  wave  number. 

Is  the  formally  exact  wave  equation  for  propagation  In  a  transversely  In¬ 
homogeneous  half-space  supplemented  with  appropriate  outgoing  wave  radiation 
and  Initial-value  coalitions.  While  functions  of  a  finite  set  of  commuting 
self-adjoint  operators  can  be  defined  through  spectral  theory,  functions  of 
noncommutlng  operators  are  represented  by  pseudo-differential  operators 
[2,5].  The  formal  wave  equation  (1)  Is  now  written  explicitly  as  a  Weyl 
pseudo-differential  equation  In  the  form 


(1/E)3x  0+(x,xt)  +  (R/2W)0’1 


•QB(£f*-t  +  *t)/2)  exP(**Et’{£t  -  xt))*+(x,x^)  ■  o 


(2) 
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In  Eq.(2),  the  symbol  flg(£,q)  associated  with  the  square  root  Helmholtz  oper 
ator  8  *  (K2(£)  +  (1/K2)yq)li/2  satisfies  the  Weyl  composition  equation 


V<£.s>  ■ 1(2 <a*  -  £2 


(t/i)2"'2 


dtdxdydz  Og(t+£,  x+<j) 


£+S)  exp(21lc(x«y  -  t*z))  (3) 

o 

with  Qr2{£,£)  the  symbol  associated  with  the  square  of  B,  B  * 

(K  ($)  +  (1/lc  )Vq)  [2,3,5].  The  generalized  Fourier  construction  procedure 

for  the  square  root  Helmholtz  operator  can  be  summarized  plctorlally  by  the 
following  correspondence  diagram 

B2  4=»  Q„2 

t  | 

B  <=>  PB 

where  the  arrows  symbolize  the  one-  and  two-way  mappings  between  the  appro¬ 
priate  quantities. 

Exact  solutions  of  the  Weyl  composition  equation  (3)  can  be  constructed 
In  several  cases  [6].  For  example,  the  symbol  an(p,q)  for  the  two-dlmen- 

2  2  2 **2 

slonal  (n  =  2)  quadratic  medium,  K  (q)  «  KQ  +  w  q  ,  Is  given  by  [6] 


0B(p,q)  3  -(exp(  1w/4)t^2/ir^2)  J  dt  exp(1(Yt  +  Xtanht)) 

•t"1^2  (lYsecht  +  1Xsech3t  -  (secht)(tanht))  (4) 

with  X  *  (l/e)(w2q2  -  p2),  Y  =  Kq/l,  and  l  *  w/lc.  Consistent  with  taking 

the  square  root  of  the  Indefinite  Helmholtz  operator,  the  corresponding 
symbols,  generally,  have  both  real  and  Imaginary  parts  characterized  by 
oscillatory  behavior  [4,6],  as  Illustrated  In  Figure  2.  Nonuniform  and 
uniform  perturbation  solutions  corresponding  to  definite  physical  limits 
(frequency,  propagation  angle,  field  strength,  field  gradient)  recover 
several  known  approximate  wave  theories  (ordinary  parabolic,  range- 
refraction  parabolic,  Grandvulllemln-extended  parabolic,  half-space  Born, 
Thomson-Chapman,  rational  linear)  and  systematically  lead  to  several  new 
full-wave,  wide-angle  approximations  [2-4,6]. 

The  exact  pseudo-differential  evolution  equation  (2)  and.  In  general, 
the  wide-angle  extended  parabolic  approximate  equations  derived  from  the 
analysis  of  the  composition  equation  [2-4,6]  are  singular  Integro- 
dlfferentlal  wave  equations.  Solution  representations  for  such  pseudo¬ 
differential  equations  can  be  directly  expressed  In  terms  of  Infinite¬ 
dimensional  functional,  or  path,  Integrals  [7,8],  following  from  the  Markov 
property  of  the  propagator.  In  an  operator  notation,  then. 
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(5) 


N 

exp(lRBx)  =  11m  TT  exp(lEBAx<) 
N — >•  j*l  J 


where  AXj  ■  x/N,  symbolically  representing  the  propagator  In  terms  of  the 

Infinitesimal  propagator.  As  the  operator  symbol  Is  not  simply  quadratic  In 
j>,  the  configuration  space  Feynman  path  Integral  formulation  Is  not  appro¬ 
priate,  necessitating  the  more  general  phase  space  construction  [4,7].  This 
results  In  a  parabolic-based  (one-way)  Hamiltonian  phase  space  path  Integral 
representation  of  the  propagator  In  the  form  [3,7] 

r  N-i  n 

G+(x,xJ  0,x!)  «  11m  /  TT  dx<t  T"T  (R/2w)n_1d£.t 

-t  -t  N - >-  /  j-1  j*l 

J  R( n-1 ) (2N-1 ) 

* expdlc  5Z  (£jt-^jt  ’  W  +  {X/N)  H^jt’-jt’-j-lt)))  (6) 


where 


Fig.  2.  The  real  ( - )  and  Imaginary  ( - )  parts  of  the  n  *  2 

quadratic  medium  symbol  as  a  function  of  X  for  Y  *  1. 
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«(£,£"  .4')  =  (l</2ir)n-1  /  dsdt  F(£'-£"  ,£) 

J  r2"-2 

•hB(£,((q' '+£' )/2)  -  t)  exp(1&s»t).  (7) 

In  Eq.(7),  F(u,v)  and  hg(£,£)  are  related  to  the  operator  symbol  flg(£,£)  by 

Og(u,v)  -  F(u,v)  hB(u,v;)  (8) 

where  n0  and  lig  are  the  corresponding  Fourier  transforms  [2,3,7]. 

The  nonuniqueness  of  the  lattice-approximation  path  integral  represen¬ 
tation  is  readily  understood  In  terms  of  different  discretizations,  or  quad¬ 
ratures,  of  the  symbolic  functional  integral  and  corresponds  to  the  repre¬ 
sentation  of  a  given  (fixed)  operator  by  different  operator-ordering,  or 
pseudo-differential  operator,  schemes  [2, 3, 7, 8].  More  fundamentally,  in 
analogy  with  the  SchrOdinger  equation  for  particle  motion  on  a  Rlemannlan 
space  and  the  thermodynamic  (Fokker-Planck)  equation  for  particle  diffusion, 
the  algorithmic  Helmholtz  path  Integral  construction  reflects  the  stochastic 
..ature  of  the  Integration  [4,9].  Further,  both  the  macroscopic  and  micro¬ 
scopic  (Infinitesimal)  half-space  propagators  can  be- formally  expressed  as 
Fourier  integral  operators  with  complex  phase  [4].  The  phase  space  path 
integral,  thus,  represents  the  macroscopic  Fourier  integral  operator  in 
terms  of  the  N-fold  application  of  the  microscopic,  or  infinitesimal, 

Fourier  integral  operator  in  a  manner  which  can  be  related  to  the  global 
geometrical-optics  construction  of  the  macroscopic  operator  [4,5]. 

The  path  integral  formulation  Interprets  the  wave  theory  in  terms  of  an 
Infinitesimal  propagator  summed  over  all  phase  space  paths.  For  the  Helm¬ 
holtz  theory,  the  exact  Infinitesimal  propagator  Is  not,  in  general,  given 
by  the  locally  homogeneous  medium  propagator,  as  in  the  ordinary  parabolic 
(SchrOdlnger)  propagator  construction  [8].  The  approximate  extended  para¬ 
bolic  wave  theories  then  correspond  to  approximate  Infinitesimal  propagators 
summed  over  the  complete  phase  space.  In  retaining  the  "sum  over  all 
paths,"  diffraction,  or  full -wave,  effects  are  Incorporated. 

For  weakly  range-dependent  environments,  range  variability  can  be,  at 
first,  accommodated  at  the  level  of  range  updating,  as  In  the  case  of  the 
parabolic  path  integral  [1,8].  For  reflection/transmission  from  a  planar 
Interface  separating  two  (different)  transversely  Inhomogeneous  acoustic 
half-spaces,  the  concept  of  reflection  and  transmission  amplitudes  general¬ 
izes  to  reflection  (r)  and  transmission  (t)  operators.  The  reflection  and 
transmission  operators,  which,  when  applied  to  the  incident  wave  field  at 
the  Interface,  produce  the  Initial  values  of  the  reflected  and  transmitted 
wave  fields,  are  defined  within  the  Weyl  pseudo-differential  operator 
framework  and  are  explicitly  determined  by  enforcing  the  well-known 
interface  continuity  conditions.  The  main  result  [10]  is  a  composition 
equation  of  the  form 


dtdxdydz  (QBL(t+£,  x+a) 


+ 
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nBR(t+p,  x+q))  Op(y+£,  z+q)  exp(2ilc(x«y  -  t»z)) 


(9) 


for  the  reflection  operator  symbol  0r(£.3)  and  an  analogous  equation  for  the 

transmission  operator  symbol  The  inclusion  of  a  planar  transition 

region  of  arbitrary  length  and  Inhomogeneity  can  be  accomplished  by  factor¬ 
ization  methods  in  conjunction  with  invariant  imbedding  [4,11].  Invariant 
Imbedding  constructs  the  Initial-value  system  for  the  reflection  and  trans¬ 
mission  operators  associated  with  the  transition  region,  transforming  the 
Helmholtz  boundary-value  problem  Into  an  Initial-value  problem.  A  dis¬ 
cretized  formulation  [11]  provides  the  extension  of  Kennett's  method  [4,11] 
In  reflection  seismology.  The  resultant  forward  and  backward  wave  fields 
propagating  in  the  transversely  inhomogeneous  half-spaces  are  represented  by 
the  one-w^y  path  Integrals,  while,  within  the  transition  region,  a  formal 
path  Integral  representation  of  the  propagator  can  be  expressed  as  a  product 
integral  [8].  This  takes  the  form  [4] 


G  = 


exp( i RH( s ) ds )  =  lim 


TT  exp(  iRH{  s .  )As . ) 
j=l  =  J  J 


(10) 


where  s 


j 


a  +  { j-1/2 )As - ,  As.  =  (x-a)/N,  a  denotes  the  transition  region 

J  J 


boundary,  H  is  the  appropriate  first-order  Helmholtz  equation  matrix 

operator  [2,4],  and  with  the  product  of  exponential  factors  ordered  from 
right  (lower  j)  to  left  (higher  j)  reflecting  the  noncommutativity  of  the 
matrix  operator  H  at  different  x.  While  product  integration-based  path 

Integral  constructions  have  been  applied  to  the  problems  of  nonrelativistic 
electron  spin  and  the  Dirac  equation,  such  infinite  products  of  matrices 
are,  generally,  only  tractable  in  simple  limiting  cases  [4,8]. 

Rather  than  starting  from  a  transversely  inhomogeneous  formulation  and, 
subsequently,  building  in  backscatter  effects,  the  generalization  of  Fourier 
methods  to  arbitrary  inhomogeneous  environments  and  the  construction  of  a 
dynamical  basis  for  the  Helmholtz  equation  can  proceed,  in  the  second  ap¬ 
proach,  from  the  construction  of  truly  global  configuration  space  path  inte¬ 
grals,  which  attempt  to  generalize,  for  example,  the  homogeneous  half-space 
result  [3,7] 


G  (x,xJ0,x')  =  11m 

n - >- 


TT  dx.+  (iirxN(n~1)N/2 


j=l 


R( n-1 ) (N-l ) 


-jt 


•(kK0/2ir$(n_1)N+1) 


( ( n-l )N+1 )/2  u(l) 


H( ( n-l )N+1 )/2 ( *K0S(  n-l  )N+1 } }  ( 11  J 


where 
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(12) 


®  (n-UN+l  ■  <N  £  <ijt  -  W  +  *2>1/2 

and  H^(|)  Is  the  Hankel  function.  These  elliptic-based  (two-way)  con¬ 
structions,  originating  from  the  Fourier  transform  relationship  between  the 
Helmholtz  and  Schrbdinger  (parabolic)  propagators,  result  In  the  approximate 
Feynman/Garrod  path  Integral  [3,7] 

o  r  N-l  N  exp(1lcSM) 

G(xlx')  *  (-1/2GZ)  11m  /  TT  dx.  TT  (lc/2Tr)ndp.  - —  (13) 

N - >•  /  j«l  j=l  J  (1/2  -  E) 

J  Rn(2N-l) 

where 

SN  =  -  ij-l’  <l4> 

corresponds  to  an  appropriate  discretized  action  and 


I  =  (1/N) 


+  V (Xj ) ) 


(15) 


plays  a  role  analogous  to  an  average  energy  with  the  Identification  V(x)  * 

2  "" 
(-1/2) (K  (jc)  -  1).  For  a  transversely  Inhomogeneous  half-space,  partial 

Integration  of  Eq. ( 13 )  In  conjunction  with  the  reflection  principle  (or 

method  of  Images)  results  In  [3,7] 


r  N-l  N 

G+{x,xJ0,x')2i  11m  /  TT  dxitTT  (E/2w)n_1d£it 

-t  -t  N — >m  I  j-i  -J*  j=i 

J  R(n-1)(2N-1) 


•  exp(lE(SN  +  21/2x(1/2  -  E)1/2))  (16) 

with  SN  and  E  taking  on  their  appropriate  forms  In  one-lower  dimension. 

Formally  reducing  both  the  full-  and  transversely  Inhomogeneous 
half-space  phase  space  Feynman/Garrod  path  Integrals  to  configuration  space 
path  integrals  [7]  establishes  the  path  functional  character  of  the 
representation.  Moreover,  the  approximate  Feynman/Garrod  path  Integral  Is 
exact  In  the  homogeneous  medium  limit.  Incorporates  significant  backscatter 
Information,  and  contains  both  the  geometrical  (r*y)  acoustic  and  ordinary 
parabolic  approximations.  This  configuration  space  formulation  for  the 
two-way  problem,  initially  based  on  a  variational  principle  and  phase  space 
constructions,  seeks  to  express  the  propagator  In  terms  of  a  phase 
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functional  evaluated  over  an  appropriate  path  space,  as  symbolically 
expressed  In  the  Feynman/DeWItt-Morette  representation  [3,7,9].  This  takes 
the  form 


where 


G(xlx’)  =  (-l/2x  )  /  Dm  exp(1lcfl(?)) 

E 

fl  =  /  II d? II  (1  -  2V(?))1/2 


(17) 


(18) 


is  the  analog  of  the  action  associated  with  a  "free  particle"  on  a  space 
with  the  metric 


dl2  -  (1  -  2V(*))II  d*||2 


and  where  E  represents  the  space  of  paths  from  x'  to  x  such  that 

1/2  =  (1/r ’  f  dt  ((1/2)11  d?(t)/dtll2  +  V(£(t))) 

^0 


(19) 


(20) 


with  the  constraints 

1(0)  «  x'  , 

l(r)  =  x  .  (21) 

The  dynamical  basis  of  the  Helmholtz  equation  can,  thus,  be  viewed  In  terms 
of  a  stochastic  process  emboc(y1ng  fixed  "average  energy"  paths,  or, 
alternatively,  In  terms  of  "free  particle"  motion  [3,7,9]. 

III.  COMPUTATIONAL  ALGORITHMS.  Direct  Integration  of  the  one-way 
phase  space  path  Integral  provides  the  computational  basis  for  the  pseudo- 
differential  wave  equation  (2).  Choosing  the  standard  ordering,  F(£,v)  3 
exp(-iKu.v/2),  In  Eqs.  (6),  (7),  and  (8)  results  In  a  numerically  more 
efficient  post-point  marching  algorithm  In  the  form 


^+(x+Ax,xt)—  /  d£t  exp(lR£t*xt)  (exp(1kAxhB(£t,xt))?+(x,£t))  (22) 

J  R"-1 

where  $+  is  the  Fourier-transformed  wave  field  and 


hB(£t,xt)  =  (*/lr,n“1  /  did!  <%<*•!>  exp(-21R(xt  -  l)#<£t  '  (23) 

7  »2n-2 
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This  marching  algorithm  provides  the  generalization  of  the  Tappert/Hardln 
split-step  FFT  algorithm  [1]  to  the  full  one-way  (factored  Helmholtz)  wave 
equation.  For  a  two-dimensional  model  ocean/bottom  propagation  environment 
with  a  perfectly  reflecting  ocean  surface,  the  Fourier  transform  of  the  wave 
field  In  Eq . (22 )  Is  replaced  by  a  discrete  fast  sine  transform  and  the  In¬ 
verse  transform  Is  evaluated  by  a  rectangular  rule  Integration,  enabling  the 
propagated  wave  field  to  be  expressed  In  the  matrix  form 


£+(x+Ax,zn)  = 


(24) 


,  +  A  + 

for  each  depth  point  zn*  In  Eq.(24),  <p  and  Q  are  column  vectors  and  the 
matrix  A  Is  defined  by  Its  matrix  elements 


Anm  ■1s,n,RWn  +  EixhB(>V2n))  exptlCAxh«(Pnl,zn> ) 


(25) 


where  h|  and  h£  are  the  even  and  odd  parts  with  respect  to  p  of  hB(p,z)  in 
Eq . (23 )  and  Tj  Is  an  appropriate  transform  normalization  constant  [1,4,12]. 

The  principal  idea  underlying  the  practical  Implementation  of  the  phase 
space  marching  algorithm  Is  the  construction  of  a  small  number  of  approxi¬ 
mate  operator  symbols,  which,  when  taken  together,  allow  for  wave  field 
computations  over  a  very  wide  range  of  model  environments  and  propagation 
parameters.  In  conjunction  with  a  study  of  exactly  soluble  cases  of  the 
Weyl  composition  equation  [6],  high-frequency,  real  Weyl  high-frequency, 
uniform  high-frequency,  and  low-frequency  approximate  symbols  have  been  con¬ 
structed  [2-4,6].  Of  particular  significance  Is  the  fact  that  the  manner  of 
marching  the  radiation  field  Is  Independent  of  the  medium  and  ary  approxi¬ 
mation  to  the  square  root  Helmholtz  operator,  resulting  In  a  modular  code 
architecture  and  highly  versatile  propagation  program.  Moreover,  the  propa¬ 
gation  models  constructed  and  computed  through  the  code  correspond  to  sing¬ 
ular  integro-dlfferentlal  equation  as  well  as  partial  differential  equation 
approximations  to  the  one-way  wave  equation.  Indeed,  this  numerical  algo¬ 
rithm  represents  one  of  the  very  few  attempts  to  compute  directly  with 
pseudo-differential  and  Fourier  Integral  operators.  For  the  two-dimensional 
case,  the  range- Incrementing  procedure  Is  just  a  sequence  of  matrix  multi¬ 
plications,  and,  thus.  Ideally  suited  for  computers  which  provide  either  a 
vector  or  a  parallel  pipe  type  of  operation.  Phase  space  filtering  reduces 
both  the  size  of  the  matrix  multiplication  and  the  number  of  matrix  elements 
Initially  computed.  In  particular,  reducing  the  total  range- Incrementing 
computational  time  by  almost  an  order  of  magnitude  for  typical  model  calcu¬ 
lations  [4]. 

Numerical  results  of  transmission  loss  (dB  re  1  m)  as  a  function  of 
range  (km)  for  a  number  of  model  ocean/bottom  propagation  experiments 
demonstrate  the  computational  viability  of  the  factor1zation-/path  Integra¬ 
tion-based  phase  space  marching  algorithm  [4,12].  Several  propagation 
experiments  are  summarized  In  Figures  3,  5,  and  7,  with  the  corresponding 
transmission  loss  curves  compared  with  a  reference  Fast  Field  Program  (FFP) 
algorithm  [4,12]  In  Figures  4,  6,  and  8. 
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SOUND  SPEED  Cm/*) 

Fig.  3.  Model  environment  1  and  propagation  experiment. 
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4.  Transmission  loss  (dB  re  1  m)  versus  range  (km)  for  model 

environment  1  at  400  Hz.  ( - )  High  Frequency  (80  degree  filter) 

(....)  FFP 
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Fig.  5.  Model  environment  2  and  propagation  experiment 
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Fig.  6.  Transmission  loss  (dB  re  1  m)  versus  range  (km)  for  model 

environment  2  at  250  Hz.  ( - )  High  Frequency  (60  degree  filter) 

( - )  High  Frequency  (....)  FFP 


300 


Transmission  Loss  (dB) 


Fig. 


Fig.  7.  Model  environment  3  and  propagation  experiment. 


For  400  Hz  propagation  in  the  exaggerated  double-well  model  of  Figure 
3,  a  wide-angle  capability  well  beyond  the  ordinary  parabolic  approximation 
is  required.  Figure  4  illustrates  the  excellent  agreement  between  the  high- 
frequency  and  FFP  algorithms  over  ranges  on  the  order  of  500  wavelengths. 
Figure  6  Illustrates  the  cumulative  growth  of  a  phase  shift  error  at  long 
range  which  characterizes  the  breakdown  of  the  high-frequency  algorithm  in 
the  250  Hz  propagation  in  the  rapidly  changing  shallow-water  model  of  Figure 
5.  Combining  Fourier  component,  or  wave  number,  filtering  with  the  high- 
frequency  algorithm  leads,  not  only  to  a  more  efficient  and,  thus,  faster 
algorithm,  but  also,  to  a  more  widely  applicable  numerical  scheme.  The 
filtering,  in  addition  to  removing  Fourier  components  which,  in  principle, 
make  no  significant  contribution  to  the  computed  wave  field,  eliminates 
those  unnecessary  regions  of  phase  space  where  the  small  error  in  the  high- 
frequency  symbol  approximation  can  lead,  in  =»  cumulative  manner,  to  serious 
discrepancies  at  sufficiently  long  ranges.  Tnis  is  particularly  well 
illustrated  in  the  60  degree  filtered  calculation  on  model  environment  2  at 
250  Hz  which  results  in  the  complete  elimination  of  the  cumulative  phase 
shift  error  (Figure  6),  greatly  extending  the  effective  computational  range. 
Sufficiently  decreasing  the  propagation  frequency  and  increasing  the  jump 
discontinuity  in  the  sound  speed,  as  illustrated  In  the  25  Hz  propagation  in 
the  shallow-water  model  of  Figure  7,  demonstrate  the  violation  of  energy 
conservation  inherent  in  the  high-frequency  wave  theory  and  the  now-rapid 
decay  with  increasing  range  of  the  corresponding  numerical  algorithm.  This 
growth  in  the  wave  field,  illustrated  In  Figure  8,  is  eliminated  by  the  real 
Weyl  high-frequency  algorithm  [4],  which  effectively  restores  energy 
conservation,  as  is  also  Illustrated  in  Figure  8.  A  more  detailed 
discussion  of  these  and  other  points  Is  presented  elsewhere  [1,4,12]. 

The  speed  and  modest  storage  requirements  of  the  filtered  one-way 
algorithm  Indicate  that  range-dependent  calculations  over  extended 
environments  should  be  feasible  with  current  supercomputer  technology.  Both 
range-updating  and  the  numerical  calculation  of  the  reflected  and 
transmitted  fields  from  an  Interface  should  be  possible  over  distances  on 

4 

the  order  of  10  wavelengths.  Preliminary  computations  with  range-dependent 
Munk-profile  deep  ocean  environments,  Including  propagation  through  extended 
shadow  regions,  compare  well  with  adiabatic  normal-mode  calculations. 

Both  the  range-dependent  and  range-independent  Feynman/Garrod  path 
integral  representations  can  be  computed  by  standard  Monte  Carlo  (statisti¬ 
cal  sampling)  methods  for  the  numerical  evaluation  of  multiple  Integrals 
[4].  While  numerically  calculating  Helmholtz  wave  fields  as  high  (in 
principle,  infinite)-dimens1onal  Integrals  Is  quite  distinct  from  the  more 
traditional  finite-difference  and  finite-element  approaches,  the  Monte  Carlo 
evaluation  of  functional  integrals  has  been  successfully  applied  In  quantum 
mechanical,  statistical  mechanical,  and  quantum  field  theoretical 
calculations  [4].  For  the  phase  space  representations  of  Eqs.  (13)  and  (16) 
In  two  dimensions  (n  =  2),  the  modeling  of  realistic  propagation  experiments 
can  Involve  the  computation  of  thousand-dimensional  oscillatory  Integrals. 
Correlated-sampl ing  variance  reduction  techniques  can  dramatically  Improve 
the  speed  and  accuracy  of  the  algorithm  [4].  Generally  speaking,  a  large 
parallel  processing  capability  should  have  a  very  favorable  impact  on  the 
numerical  computation  of  path  Integrals  [4]. 

IV.  INVERSE  FORMULATION.  The  phase  space-based  construction  of  the 
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square  root  Helmholtz  operator  provides  the  basis  for  a  formulation  of  the 
inverse  algorithms  mentioned  in  the  Introduction.  Mathematically,  the 
refractive  index  field  (or  Its  square)  Is  reconstructed  from  the  full-space 
Helmholtz  Green's  function  G  through  the  relationship 

B(xt,x;)  =  (21/lc)  11m  fofcx.xJO.xi)).  (26) 

-t  -t  x — >0  x  ~z 

The  symbol  flB(£,£)  is  then  constructed  through  an  inverse  Fourier  transform 

of  the  kernel  function  B(xt,x^.)  and  subsequently  yields  the  refractive  Index 

f*eld  upon  a  direct  application  of  the  Weyl  composition  equation  (3)  forlp|= 
0.  In  the  homogeneous  medium  limit,  the  direct  evaluation  of  the  composite 

p 

symbol  reduces  to  the  square  of  the  symbol,  Og2(£,£)  *  Qg(£.*S^ • 

The  inverse  algorithm  proceeds  around  the  correspondence  diagram  (pictorial 
summary)  in  a  counterclockwise  fashion.  The  direct  propagation  algorithm 
requires  the  inversion  of  Eq.(3)  while  the  Inverse  propagation  algorithm 
only  requires  a  direct  computation  of  Eq.(3).  Thus  the  direct  propagation 
problem  has  been  transformed  Into  an  "Inverse"  problem  while  the  wave  field 
inversion  problem  has  been  reformulated,  In  an  appropriate  sense,  as  a 
direct  calculation. 

The  factorization  algorithm  exactly  Inverts  the  Inherently  nonlinear 
relationship  between  the  wave  field  data  and  the  refractive  Index  field  as 
reflected  in  the  Llppmann-Schwlnger  equation  for  the  propagator  [3].  Most 
importantly,  It  Is  a  multidimensional  formulation.  For  the  "physical 
experiment,"  a  point  source  Is  Introduced  into  the  medium  defining  the 
initial -value  (x  *  0)  plane.  The  second  derivative  with  respect  to  the 
range  of  the  wave  field  Is  then  determined  as  a  function  of  the  point  source 
and  receiver  positions.  Collecting  the  data  on  the  Initial-value  plane 
would  most  probably  limit  the  application  of  the  algorithm  to  specific  types 
of  bore-hole  experiments.  Moreover,  mathematically,  the  Inversion  requires 
the  evaluation  of  singular  Integrals  (generalized  functions).  Collecting 
data  on  a  downfield  plane  (x  >  0)  leads  to  a  transmission  experiment  similar 
to  the  oceanic  sound  speed  profile  Inversion  method  of  DeSanto  [3].  The 
downfield  wave  field  provides  for  an  appropriate  analytic  continuation  in 
the  factorization  algorithm  and  connects  the  analysis  with  the  inverse 
diffraction  problem  [3]. 

The  transmission,  or  propagation,  formulation  Is  analogous  to 
tomography.  The  reference  wave  number  in  the  factorization  analysis 
corresponds  to  2ir/(Planck's  constant)  as  opposed  to  Its  square  playing  the 
role  of  an  energy.  The  source  generation  and  data  collection  over  parallel 
planes  then  naturally  correspond  to  the  multidirectional  insonlfylng  plane 
waves  and  subsequent  angular  data  collection  of  fixed-energy  (frequency) 
diffraction  tomography  [3].  For  range-dependent  environments,  the  Inclusion 
of  backscatter  effects,  even  In  an  approximate  manner,  would  then  provide 
the  basis  for  a  generalized  acoustic  tomography,  extending  the  diffraction 
algorithms  based  on  the  Born,  ftytov,  or  distorted-wave  Born  approximations 
[3].  The  nonlinear  factorization  and  subsequent  weak -backscatter 
perturbation  theory  would  extend  the  linearized  weak-scatterlng  treatments 
into  the  nonlinear  regime.  This  can  be  attempted  In  two  ways.  Formal  field 
splitting  analysis  provides  the  basis  for  a  weak-backscatter  perturbation 
theory  within  the  framework  of  invariant  Imbedding  [2-4].  The  arbltrary- 
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dimensional  nature  of  the  factorization  analysis  In  conjunction  with 
mathematical  Imbedding  concepts  provides  the  basis  for  a  spatial -dimensional 
perturbation  theory  [2-4].  This  essentially  Involves  treating  the  spatial 
dimension  of  both  the  Helmholtz  operator.  In  general,  and  the  refractive 
Index  field.  In  particular,  as  a  variable  and  subsequently  studying  the 
structure  of  the  resulting  family  of  systems  Indexed  In  this  manner.  For 
the  case  of  two  (different)  transversely  Inhomogeneous  half-spaces 
separated  by  a  planar  Interface,  an  Inverse  algorithm  can  be  Initially  based 
on  the  composition  equation  (9). 

For  a  transversely  Inhomogeneous  environment,  the  factorization 
Inversion  model  Invites  comparison  with  “effective  one-dimensional" 
stratified  environmental  models  such  as  that  of  Stickler  and  Delft  [4].  In 
both  models,  the  location  of  the  field  source  (finite)  and  the  data  measure¬ 
ments  Is  within  the  scattering  region.  Most  Importantly,  the  factorization 
method  Is  a  direct  inversion  of  an  arbitrary-dimensional  propagation 
equation  which  requires  less  symmetry  than  those  models  (i.e., 
Stickler-Delft)  reducible  to  the  standard  one-dimensional  formulation  of 
Del f t-Tru bowl tz  [4]  or  Gelfand-Levltan  [4].  Thus  for  example.  In  a  general 
n-dimensional  Cartesian  formulation,  the  refractive  index  field  can  be  a 
function  of  as  many  as  (n-1)  coordinates  In  the  factorization  model,  while  a 
function  of  only  one  coordinate  In  an  "effective  one-dimensional"  model. 

The  experiment  envisioned  and  the  distinguished  direction  differ  In  the  two 
models.  In  the  transversely  Inhomogeneous  environment,  the  direction  In 
which  there  Is  medium  homogeneity  Is  distinguished,  while  In  the  "effective 
one-dimensional”  model,  the  one  direction  In  which  there  Is  medium  Inhomo¬ 
geneity  Is,  In  effect,  distinguished.  Data,  In  both  cases.  Is  collected 
perpendicular  to  the  distinguished  direction.  The  Stickler-Delft  model  is 
essentially  a  one-dimensional  scattering  experiment  with  the  surface  data. 

In  effect,  reflection  coefficient  data.  Thus  unlike  the  transmission 
experiment,  which  extensively  samples  the  region  of  Inhomogeneity,  In  the 
factorization  model,  the  Stickler-Delft  analysis  does  not  account  for  the 
presence  of  "trapped  modes"  [4].  The  formal  inclusion  of  a  specific 
pressure-release  surface  within  the  pseudo-differential  operator  framework 
would  allow  for  a  stratified  environmental  model  and  the  subsequent 
quantitative  comparison  with  the  Stickler-Delft  model. 

For  applied  Inverse  problems,  approximate  Inversions  nwy  prove 
adequate.  Approximate  Inversion  algorithms  follow  readily  from  the  pertur- 

2 

bative  treatments  of  the  Weyl  composition  equation.  K  (3)  Is  related  to 
Qb(0,£)  In  a  quadratic  fashion  and  through  a  linear  integral  relationship, 

respectively.  In  the  high-frequency  (E — >-)  and  weak-lnhomogenelty  (Born) 
limits.  In  particular,  the  high-frequency  algorithm  Is  based  upon  choosing. 
In  practice,  a  )p|  such  that  the  symbol  approaches  Its  asymptotic  form, 

2  21/2 

QB(£,.a)~  (K  (5)  -  £  )  '  .  The  approach  to  the  asymptotic  regime  In  phase 

space  Is  governed  both  by  the  magnitude  of  K2(<j)  -  p2  (large)  and  the 
variation  of  the  refractive  Index  field  on  the  wavelength  scale  (small). 
Figure  9  Illustrates  the  high-frequency  Inversion  for  the  case  of  a 
quadratic  medium.  Applying  the  full  composition  equation  for  the  Inversion 
would  result  In  a  linear  function  In  X  for  the  real  part  and  an  Imaginary 
part  which  Is  Identically  zero.  Finally,  weighted  Hilbert  space  methods  for 
Incorporating  prior  estimates  appear  to  be  applicable  to  the  Fourier-based 
factorization  approach  [4]. 
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SCALE  INVARIANT  EQUATIONS  FOR  RELATIVISTIC  WAVES 
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Vicksburg ,  Mississippi  39180 


ABSTRACT .  The  basic  trace  equation  of  relativistic  thermodynamics  is 
decoupled  into  two  Callan-Symanzik  type  renormalization  group  equations  that 
connect  the  matter  fields  with  the  thermodynamic  gauge  parameters.  These 
equations  determine  two  characteristic  curves  along  which  the  solution  to 
the  trace  equation  assumes  a  simple  form.  The  differential  equation  describ¬ 
ing  the  variation  of  the  Grtineisen  parameter  with  pressure  is  derived.  A 
perturbation  procedure  is  applied  to  the  potential  form  of  the  renormaliza¬ 
tion  group  equations  in  order  to  develop  the  corresponding  potential  form  of 
the  renormalization  equations  for  waves  in  a  relativistic  medium.  A  method 
for  calculating  the  Debye  temperature  for  the  excited  states  of  solids  and 
quantum  liquids  is  developed.  The  amplitude  and  spectrum  of  waves  in  ther¬ 
modynamic  media  are  calculated.  A  simple  equation  is  derived  that  scales 
the  wave  amplitudes  for  different  material  densities  (pressures).  The  re¬ 
sults  of  this  paper  will  have  applications  to  nuclear  blast  loadings,  the 
interaction  of  directed  energy  beams  with  matter,  and  to  various  high  den¬ 
sity  geophysical  and  astrophysical  phenomena. 

1.  INTRODUCTION.  The  renormalization  group  was  originally  developed 
for  problems  in  quantum  field  theory.1'2  But  over  the  years  it  has  become 
an  important  technique  in  many  areas  of  physics  including,  phase  transi¬ 
tions,  critical  phenomena,  hydrodynamics,  and  statistical  mechanics. 3-5 
The  renormalization  group  consists  of  a  set  of  continuous  transformations 
that  establish  a  correspondence  between  sets  of  parameters  that  define  phys¬ 
ically  different  states.  In  particular,  the  renormalization  group  gives 
a  correspondence  between  systems  having  different  correlation  lengths.  The 
correlation  length  of  a  physical  system  is  the  distance  over  which  local 
particle  densities  are  correlated.  Ordinarily  the  correlation  length  is 
approximately  equal  to  the  range  of  interaction  between  two  component  par¬ 
ticles,  however,  near  the  critical  point  of  a  fluid  the  correlation  length 
is  much  greater  than  the  range  of  pair  interactions.6  The  renormalization 
group  is  commonly  described  by  a  set  of  differential  equations  for  the 
physical  state  parameters.3 

A  set  of  renormalization  group  equations  in  potential  form  has  been 
developed  for  the  ground  state  parameters  of  a  relativistic  thermodynamic 
system.7  In  this  case  the  correspondence  between  sets  of  parameters  refers 
to  a  change  of  the  local  scale  (gauge) ,  and  this  change  of  scale  is  equi¬ 
valent  to  a  change  in  the  correlation  length.  The  potential  form  of  the 
renormalization  group  equations  consists  of  a  set  of  differential  equations 
for  the  two  gauge  parameters  of  relativistic  thermodynamics. 7  These  equa¬ 
tions  are  obtained  by  requiring  the  basic  trace  equation  of  relativistic 
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thermodynamics  to  be  Invariant  under  a  local  scale  transformation  that  corre¬ 
sponds  to  a  change  in  the  correlation  length  of  the  system. 

The  trace  equation  of  relativistic  thermodynamics  is  written  as8 


(1) 


where  U  -  relativistic  Internal  energy,  P  •  relativistic  pressure,  T  -  abso¬ 
lute  temperature,  V  -  volume  of  substance,  and  Ua  and  P*  ■  corresponding  non- 
relativlstic  internal  energy  and  pressure.  Throughout  this  paper  the  index 
"a"  will  refer  to  nonrelativlstic  calculations.  The  trace  equation  (1)  can 
be  rewritten  as7 


l  +  y 


3V 


3T  1 


(2) 


■  (l-b‘  +  I^-b*vw)E* 


where  E  ■  relativistic  energy  density  »  U/V,  E*  »  nonrelativlstic  energy 
density,  and  where7 


TOP/3T) 

b"  (p-v 


a  T(3Pa/3T)v 

(P*  -  k£) 


(3) 

(4) 


(5) 


where  y  ■  Grllneiaen  parameter,  Cy«  heat  capacity  at  constant  volume,  and 

**  *  "(S)t 
<  ■  -(£)t 
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are  the  relativistic  and  nonrelativistic  values  of  the  bulk  modulus  respect¬ 
ively.  The  parameters  b  and  y  are  the  gauge  parameters  of  relativistic 
thermodynamics . 

For  a  solid  or  low  temperature  quantum  system  the  nonrelativistic  state 
equation  of  the  ground  state  is  assumed  to  have  the  following  form8 * 9 

Ea  -  Ea  +  EaTJ  +  •  •  •  (8) 

°  S 

Pa  -  Pa  +  PaTJ  +  • • •  (9) 

°  j 


where  E  and  P  ■  nonrelativistic  energy  density  and  pressure  respectively. 

Eg  and  Pg  ■  nonrelativistic  zero-temperature  values  of  the  energy  density  and 
pressure  respectively,  Eg  and  Pg  -  nonrelativistic  thermal  coefficients  for 
the  energy  density  and  pressurerespectively,  T  -  absolute  temperature  of  the 
system  (°K) ,  and  j  -  numerical  index  having  v?luer  characteristic  of  the  type 
of  physical  system.  Typical  examples  of  systems  that  are  described  by  equa¬ 
tions  (8)  and  (9)  are8 

J  ■  1  high  temperature  solid 

j  ■  2  low  temperature  Fermi  gas 

j  ■  5/2  low  temperature  molecular  Bose  gas 

j  ■  4  low  temperature  solid 

A  commonly  used  descriptor  of  the  thermal  state  equations  given  by  equa¬ 
tions  (8)  and  (9)  is  the  nonrelativistic  zero-temperature  value  of  the 
Grtlnelsen  parameter  that  is  defined  by8’ 9 


y 


a 

o 


1 

(j-D 


— L  jL(vEa) 
Ea  drvcj; 

J 


(10) 


a 

except  for  J  ■  1.  Here  yD  -  nonrelativistic  zero-temperature  value  of  the 
Grtineisen  parameter,  and  V  -  volume  of  the  material  system.  When  j  ■  1  , 
yg  -  2/3  .  The  zero  temperature  value  of  the  nonrelativistic  bulk  modulus 
is  given  by  Kg  ■  ndpg/dn  ,  where  n  ■  N/V  •  number  of  moles  per  unit  volume, 
and  N  -  number  of  moles  of  a  substance. 


The  corresponding  relativistic  state  equations  will  be  written  as8'9 


E  -  E  +  E,TJ  +  ••• 
o  J 


(11) 
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P  *  Pq  +  PjT^  + 


i  m  1  Id  _  , 

TFi)'  3v(vV 


(13) 


except  for  j  -  1  ,  when  yq  -  2/3  ,  and  where  E0  and  P  -  relativistic  zero- 
temperature  energy  density  and  pressure  respectively, °Ej  and  Pj  ■  relativis¬ 
tic  thermal  coefficients  for  the  energy  density  and  pressure  respectively, 
and  y0  -  relativistic  zero-temperature  Grtlnesien  parameter.  The  relativistic 
value  of  the  zero  temperature  bulk  modulus  is  given  by  Kq  -  ndPQ/dn  .  Combin¬ 
ing  equation  (2)  with  the  state  equations  (8)  through  (13)  yields  the  follow¬ 
ing  ground  state  equations8 


E 
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-  3 [ (1  +  yq)P 
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(14) 
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(15) 


The  potential  forms  of  the  ground  state  renormalization  group  equations 
for  the  gauge  parameters  b  and  y  are  determined  from  the  requirement  of  local 
scale  invariance  of  equation  (2). 7  A  form  of  the  renormalization  group  equa¬ 
tions  that  is  commonly  used  in  quantum  field  theory  and  the  theory  of  critical 
phenomena  are  the  Callan-Symanzik  equations.1-3  In  this  paper  the  Callan- 
Symanzik  form  of  the  renormalization  group  equations  for  the  relativistic 
ground  state  of  thermal  media  will  be  obtained  directly  from  equation  (2) . 

Two  forms  of  the  renormalization  group  equations  for  radiation  in  matter 
are  obtained  in  this  paper.  The  potential  form  of  these  equations  is  obtained 
by  a  perturbation  procedure  that  is  applied  to  the  potential  form  of  the  re¬ 
normalization  group  equations  for  the  ground  state.  The  Callan-Symanzik  form 
of  the  renormalization  group  equations  for  radiation  will  be  obtained  by  a 
perturbation  procedure  applied  directly  to  equation  (2). 

An  important  task  of  modern  physics  is  the  determination  of  the  effects 
of  gauge  (scale)  invariance  on  the  ground  state  and  excitations  of  matter. 
These  effects  have  been  treated  for  the  ground  state  of  a  thermodynamic 
system.8  An  approximate  treatment  has  been  developed  for  waves  in  solids  and 
quantum  liquids  by  assuming  that  the  diffuse  radiation  factor  and  the  radia¬ 
tion  Grllneisen  parameter  art  equal.7  This  paper  presents  a  completely  general 
procedure  for  calculating  t’ie  relativistic  amplitude  and  spectrum  of  elastic 
waves  in  solids  and  quantum  liquids.  A  set  of  coupled  second  order  radiation 
equations  is  developed  that  determines  the  relativistic  energy  density  and 
Grllneisen  parameter  for  radiation  in  solids  and  Fermi  and  Bose  liquids. 
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2.  RENORMALIZATION  GROUP  EQUATIONS  FOR  THE  GROUND  STATE.  The  ground 
state  of  a  relativistic  thermodynamic  medium  is  described  by  equation  (2) 
where  y  and  b  are  gauge  parameters.7  Equation  (2)  can  be  decoupled  into  two 
independent  equations  by  noting  that  E  and  P  are  related  by  the  Gibbs- 
Helmholtz  relation  as  follows 

mT  -  *  *  #)T  ■  *(f)v  - ' 

With  the  introduction  of  a  Langrange  undetermined  multiplier  n  ,  equation  (16) 
can  be  rewritten  as 

-,(i  +  v^)e+ n(i  -  T^)p  -  0  (17) 

Combining  equations  (2)  and  (17)  yields  the  following  decoupled  equations 

[t  +  (n*b)v  w  +  n  +  1 " b]E  •  (T  ^ " bSv  w  +  1  * bS)Ea  (18) 


[v  W  “  (Y  -  I)T  W  '  I  +  Y  +  l]p  "  0  (19) 


The  undetermined  multiplier  n  is  in  general  a  function  of  V  and  T.  From 
equation  (19)  it  follows  that 


J1 

3 


-YTii  + 

3V  Y1  3T 


(Y  +  1)P 


P  .  T  3P 
3T 


(20) 


except  when  the  denominator  is  zero  (as  in  the  case  of  an  ideal  gas  for 
which  P  -  nRT).  For  T  ■  0  ,  equations  (18)  through  (20)  become 


<"oV  W  +  no  +  1)E0  ’  E0 


(v  W  ■  T  +  Y0  +  1)po  ■  0 


no  V  «. 

3  ■  P  dV  +  Y0  +  1 
o 


(21) 

(22) 

(23) 
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where  nQ  •  the  Lagrange  undetermined  multiplier  for  T  ■  0  .  It  is  easy  to 
show  that  combining  equation  (21)  through  (23)  by  eliminating  the  Lagrange 
multiplier  and  using  the  T  ■  0  form  of  equation  (16)  which  is 


P 

o 


(24) 


gives  the  T  ■  0  ground  state  equation  (14). 

Equations  (18)  and  (19)  can  be  rewritten  as 


<Tw  +  ££  +  M)E-*a 
<Tw  +  h£  +  N)p  - 0 


(25) 

(26) 


where 


V 

e 

(27) 

n  -  b 

(28) 

1 

n/3  -  y 

(29) 

f  +  1 

(30) 

h  -  1 

(31) 

<Tw-b‘£+  1 

(32) 

Equations  (25)  and  (26)  are  immediately  recognized  to  be  similar  in  form  to 
the  Callan-Symanzik  equations  that  describe  the  renormalization  group.1*3 
The  physical  meaning  of  equations  (25)  and  (26)  is  that  T  and  V  can  be  con¬ 
sidered  to  be  arbitrary  parameters  and  that  the  trace  equation  of  relativistic 
thermodynamics  is  form  invariant  for  any  choices  of  values  for  T  and  V,  i.e., 
arbitrary  values  of  temperature  and  volume  are  acceptable  in  equation  (2) . 
Equations  (25)  and  (26)  connect  the  matter  fields  E  and  P  to  the  gauge  fields 
Y  and  b.  Using  equations  (25)  and  (26)  allows  f  and  h  to  be  written  as 
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f 


(33) 


* 


a 


h 


P  -Kt 


(34) 


which  for  the  case  T  *  0  become 


(35) 

(36) 


The  functions  f  and  h  are  the  thermodynamic  analogs  of  the  Gell-Mann-Low 
functions. 1 

In  analogy  to  equation  (27)  the  introduction  of 

T  -  efc  (37) 

allows  equations  (25)  and  (26)  to  be  rewritten  in  simpler  form  by  evaluating 
the  derivatives  along  two  characteristic  curves  as  follows 


d£  ,  yr  . a 

_  +  ME  - 

77  +  NP  -  0 
at 


The  two  characteristic  curves  are  defined  by 


f(V  T)  ■  —  — 
'  dt  V  dT 


h/VTl  .  dv  _  T  dV 

h(V.T)  dt  v  dT 


(38) 

(39) 


(40) 


(41) 
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The  characteristic  equations  (38)  and  (39)  can  be  solved  formally  as 

E  -  Ce-/Mdt  +  e-W *ae/Mdt  dt  (42) 

P  -  De-/Ndt 

where  C  and  D  are  constants.  Therefore,  along  the  characteristic  curves 
the  solutions  to  equations  (25)  and  (26)  assume  a  simple  form. 

The  requirement  of  local  scale  (gauge)  invariance  demands  that  equation 
(2)  be  invariant  under  transformations  of  the  form  P  ->  P'  ■  Pe-^  and 
E  -*■  E'  =  Ee-1^  where  and  tp  are  functions  of  V  and  T.7  These  transformations 
describe  a  correspondence  between  physical  states  having  different  correlation 
lengths.  The  scale  invariance  condition  is  taken  to  be  analogous  to  the  con¬ 
dition  of  local  gauge  invariance,  and  yields  the  following  differential  equa¬ 
tions  for  the  gauge  parameters  using  the  e“^  and  e-1^  transformations7 


T  Vd± 

q>  9t  ~  4>  av 


p  - 


Ti£ 

3T 


PT 

PT  3T 


(44) 


lli 

3T 


V  3£ 
»  5V 


H.  Evii 

3V  3V 


E  +  V 


(45) 


These  are  the  potential  forms  of  the  renormalization  group  equations  for  the 
relativistic  ground  state  of  a  thermodynamic  system;  they  correspond  to  scale 
transformations  with  negative  signs  in  front  of  the  potential  functions  <(>  and  i|i. 


From  equations  (44)  and  (45)  it  is  clear  that  the  denominators  of  these 
equations  are  not  symmetrical  under  $  -*■  and  ip  -*•  .  In  fact,  the  require¬ 

ment  of  scale  invariance  for  equation  (2)  under  the  transformations 
P  -*■  P'  -  Pe+<^  and  E  -*■  E*  ■  Ee'H'  yields  the  following  result 


l*f 


T  V  3£ 

Y  4>  3T  ~  4>  3V 

p  -  T  —  -  PT 

3T  3T 


(46) 
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(47) 


T  30;  kV3t 
<t>  3T  ~  b  »  3V 

E  +  V  ||  +  EV  II 


3V 


3V 


However,  the  physical  derivatives  of  the  gauge  parameters  must  be  independent 
of  the  signs  of  the  potential  functions  that  appear  in  the  scale  transforma¬ 
tions,  and  the  simplest  way  to  accomplish  this  is  to  assume  that  the  physical 
derivatives  are  given  by  the  following  symmetric  forms 


£-•?[(&)'♦  (if] 

£-*  [(§)'+(*)*] 

Equations  (48)  and  (49)  can  be  rewritten  as 


(’-*£) 

(y 

T3i  V  c>±) 
*  3T  ~  *  3VJ 

1 

(p  - 1  +  pi 

i±) 

3Tj 

Kp-Tw* 

PTii) 

3T  / 

db  (t'fVl7)(lli-1,llv) 

*  (e  +  vw-ev!I)(e  +  vw  +  ev!$) 

Equations  (SO)  and  (51)  can  be  rewritten  as 


(48) 


(49) 


(50) 


(51) 


(50a) 


(51a) 


Equation  (50)  has  the  obvious  property  that  for  an  ideal  gas  dy/dP  *  0  because 
in  this  case  P  -  nRT,  and  this  agrees  with  the  fact  that  the  Grtinelsen  para¬ 
meter  for  an  ideal  gas  is  given  by  y  ■  2/3  .  Equations  (50)  and  (51)  are  the 
potential  forms  of  the  renormalization  group  equations  for  the  relativistic 
thermodynamic  ground  state. 
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The  potential  function  <f>  is  related  to  n  u 
temperatures  (T  >  0D)  by  $  -  6n/T  and  fit  fh  Debye  temPerature  0D  for  high 
<i  ■  t/0_  pftr  lj  i  .  D  •  ana  for  low  temperatures  Ci*  **  ® 

9  D  •  For  high  temperatures,  the  use  of  A  -  qI/t  J  (T  ®D> 

1  ♦  ®D/T  in  equation  (50)  gives 


ii  „ 

dP 


(52) 


where 


D  rp  3P 

®H  T  81  -  P  -  P 


M1 


Equation  (52)  can  be  rewritten 


as 


(53) 


(54) 


ii . 

dP 


(52a) 


teXatu?:?"al  tHC  fOU°“1"8  ■»*  for  ordinary  „t„iala  «  hlgh 


T 


P 


(55) 


JLi  »,/!  T  39D\ 
6  3»  '■y\1-9-3T) 


(56) 


?w.!°Jl0WS  fr°m  equation  (52)  that  in  gener 
GrUneisen  parameter  condition  (dy/dP  % 0)  t 


dy/dP  <  0  .  The  slow  varying 
equation  (52)  yields7 


(57) 
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which  is  a  standard  equation  of  high  pressure  physics.7  If  0^  does  not  depend 
appreciably  on  temperature  it  follows  from  equation  (57)  that 


Y 


x!!d 
eD  3v 


(58) 


For  low  temperatures 


the  use  of  4 


T/0j)  in  equation  (50)  yields 


±L  . 

dP 


T  36P\  +  /(A.B  ) 

eD  3T  )  +  0D  3V  J  /(Vl) 


(59) 


where 


*L 


xIM 

6d  3T  / 


T  36D  \ 
% 


Equation  (59)  can  be  rewritten  as 


(60) 


(61) 


(59a) 


The  slow  varying  Grtlneisen  parameter  condition  (dy/dP  ^  0)  for  equation  (59) 
also  yields  the  result  in  equation  (57) .  For  the  case  T  ■  0  ,  equation  (59) 
becomes 


^o 

dP 

o 


,0  av  ;/pc 


(62) 


where  0§  ■  the  T  ■  0  value  of  the  Debye  temperature.  The  slow  varying  Griineisen 
parameter  approximation  applied  to  equation  (62)  yields 


Y 


0 


o  3V 


(63) 
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3.  LOCAL  GAUGE  INVARIANCE  FOR  PHASE  ROTATIONS.  The  sign  of  the  zero  tem¬ 
perature  derivative  dy0/dP0  that  appears  in  equation  (62)  is  not  obtained  from 
the  previous  analysis.  An  indication  of  what  this  sign  is  can  be  obtained  by 
introducing  complex  local  gauge  transformations  for  the  pressure,  energy  den¬ 
sity,  and  the  gauge  parameters  y  and  b  .  This  corresponds  to  phase  rotations 
of  the  pressure  and  energy  density  relative  to  the  gauge  parameters  as  follows. 


r  -  Pe11* 


and  E' 


Ee 


tiip 


and  correspondingly  y’ 


-  ±iS  .  r,  r  ±iW 
ye  and  b  ■  be 


where  P  ,  E  ,  y  and  b  must  now  be  taken  as  complex  numbers  whose  magnitudes  are 
respectively  P  ,  E  ,  y  and  b  ,  and  where  S  ■  S (<f>)  and  W  =  W(iji)  are  the  phase 
angles  of  the  y  and  b  gauge  parameters  respectively.  Thus  whereas  the  real  ex¬ 
ponentials  e±(^  and  e*^  correspond  to  changes  in  the  pressure  and  energy  density, 
the  complex  exponentials  correspond  to  phase  rotations  where  the  magnitudes  P  , 

E  ,  y  ,  and  b  are  held  fixed. 


The  phase  angles  S(4>)  and  W(ip)  are  determined  from  the  condition  of  local 
gauge  invariance  on  the  fundamental  trace  equation  (2) .  It  is  first  noted  that 
for  P  ,  E  ,  y  ,  and  b  held  fixed  it  follows  that 


il  «  X  db  _  b  d  Vf 

dp  P  d<|>  d£  "  £  ^ 


(63a) 


Then  it  follows  that  the  local  gauge  invariance  conditions  and  the  symmetriza- 
tion  equations  (48)  and  (49)  yield  the  following  renormalization  group  equations 
for  phase  rotations  in  analogy  to  equations  (50a)  and  (51a) 


X  dS 
P  d  <P 


b  dW 


(63b) 


(63c) 


Note  the  positive  sign  in  the  denominators. 

For  high  temperatures,  <{>  -  9d/T  and  S  ■  Sd/T  ,  where  Sp  -  characteristic 
temperature  of  the  Grilnelsen  parameter  phase  angle.  In  this  case  equation 
(63b)  becomes 


x 

P  d<p 


(63d) 
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where 
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(63e) 


A  comparison  of  equations  (52)  and  (63d)  shows  that  at  high  temperatures 
dy/dP  <  0  and  dS/d$  <  0  . 

For  low  temperatures  let  $  -  T/0D  and  S  ■  T/Sq  and  get  from  equation 

(63b) 


(63f) 


where  the  derivative  dS/dt  is  evaluated  as  follows 


dS 

d<|> 


(63g) 


Thus  at  low  temperatures  dS/d$  >  0  and  from  equations  (59)  and  (63f)  it 
follows  that  dy/dP  >  0  at  low  temperatures.  In  particular  the  T  ■  0  case 
of  equation  (63g)  is 


where  0j)  and  Sp  are  the  T  ■  0  values  of  0p  and  Sj.  respectively.  Therefore 
for  T  ■  0,  equation  (63f)  becomes 


K 

3V 
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Note  that  P  <  0  for  bound  systems  such  as  solids  at  standard  pressure, 
o 
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A  comparison  of  equation  (62)  and  (63i)  then  shows  that 


P 

o 


dP 

o 


(63  j) 


and  therefore  it  follows  that  if  6p  >  0  and  s{)  >  0  then  dy0/dP0  >  0  ,  and 
from  equation  (62)  one  concludes  that 


y 


o 


> 


(63k) 


Since  in  general  dy/dP  <  0  at  high  temperatures,  while  equation  (63j)  shows 
that  dy/dP  >  0  at  T  -  0  ,  it  follows  that  as  the  temperature  of  a  solid  or 
quantum  liquid  is  lowered  a  value  of  the  temperature  is  finally  reached  at 
which  point  the  sign  of  dy/dP  changes  from  a  negative  to  a  positive  value. 

It  should  be  pointed  out  that  a  similar  analysis  is  not  possible  for  the 
angles  and  W  (associated  with  E  and  b  respectively)  that  appear  in  equation 
(63c)  because  the  gauge  parameter  b  -*•  0  when  T  -*■  0  .  This  condition  combined 
with  equation  (63c)  suggests  that  has  the  following  low  temperature  form 


«MV,T)  -  A(V)  exp  (  /G(V,T)dT )  (63£) 

where  G(V,T)  ■  some  polynomial  function  of  T  . 

4.  RENORMALIZATION  GROUP  EQUATIONS  FOR  EXCITATIONS.  When  electromagnet 
ic  or  mechanical  waves  of  small  amplitude  are  present  in  a  thermodynamic  medi 
um,  the  renormalization  group  equations  for  the  excitations  can  be  obtained 
either  directly  as  a  perturbation  on  the  ground  state  equation  (2)  for  the 
energy  density,  or  as  a  perturbation  on  the  potential  forms  of  the  ground 
state  renormalization  group  equations  given  by  equations  (44)  through  (49). 
When  excitations  are  present  the  pressure,  energy  density,  bulk  modulus,  and 
heat  capacity  become  P  +  Pr  ,  E  +  Er  ,  K-j  +  Rj.r  ,  and  Cv  +  CVr  respectively, 
where  Pr  -  radiation  pressure,  Er  ■  radiation  energy  density,  and  where 


i  8Pv 

r — I  ■  radiation  bulk  modulus 
l  9n  / x 

/ 

'  3Er\ 

CVr  "  V( 

pf")  m  radiation  heat  capacity 
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The  Grilneisen  parameter  for  a  thermodynamic  medium  with  excitations  is 
obtained  from  equation  (3)  as  follows 


Y  + 


6  - 
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C„  +  C 


Vr 


lFr<p 


V 


(64) 


where  6r  -  change  in  the  system  Grtlneisen  parameter  due  to  the  presence  of 
radiation.  Expanding  equation  (64),  subtracting  equation  (3),  and  keeping 
only  first  order  terms  gives 
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„  3F 

_V _ r 

Cv  3T 


(65) 


where  ■  Grtlneisen  parameter  for  the  radiation  field  Itself  and  is  given  by 
3P  /  3E 

Yr  “  ax“  /  3x“  (66) 


The  gauge  parameter  b  for  an  excited  thermodynamic  medium  is  obtained  from 
equation  (4)  to  be 


b  +  6 


t(K  +  III) 

\3T  3T  / 


“  “r  +  Pr  ’  ^r 


(67) 


where  Br  ■  change  in  b  parameter  due  to  the  presence  of  radiation  in  the 
system.  Expanding  equation  (67),  keeping  only  first  order  terms,  and  sub¬ 
tracting  equation  (4)  gives 


3Pr  3P 

T3T  TW<Pr-KTr> 

P  "  *T  (P  -  K^.)2 
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3P 


T  5  <Pr  -  *Tr> 
(P  -  V2 
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where  br  -  radiation  gauge  parameter  given  by 
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(68a) 


The  parameters  yr  and  br  are  the  two  radiation  gauge  parameters  of  the  thermal 
medium. 

The  renormalization  group  equations  for  radiation  can  be  put  Into  a 
Callan-Symanzik  form  by  first  combining  equation  (2)  with  equations  (64)  and 
(67)  as  follows 

[l  -  (b  +  6r)  +  T  -  (b  +  8r)V  <E  +  Er)  (69) 

-  3  [l  +  v  +  6r  +  V  ^  -  (y  +  Sr)T  £]  (P  +  Pr) 

-  [l  -  <ba  +  B*>  +  T  -  <ba  +  S*)V  (Ea  +  Ea) 

where  Is  given  by  the  nonrelativlstlc  analog  of  equation  (68) .  Subtract¬ 
ing  equation  (2)  from  equation  (69),  and  keeping  only  first  order  radiation 
terms  yields  the  following  radiation  equation 

(l-b  +  T&-bV^)Er-»r(lH-p)  (70) 

-3[(1  +  'f  +  VW‘1'Tw)Pr-{r(Tf -P)] 

-(1-ba  +  T^-bavX)EJ.8j(i|P!.pa) 
where  the  following  standard  thermodynamic  relation  was  used 


3U 

3V 


E  +  vH 


p 


(71) 


Equation  (70)  is  a  first  order  radiation  equation  that  can  be  applied  to 
any  thermodynamic  system  such  as  gases,  solids,  and  quantum  liquids. 
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Equation  (70)  can  be  separated  into  two  radiation  equations  each  of 
which  is  similar  in  form  to  the  Callan-Symanzik  equation.  This  is  done  by 
using  the  Gibbs-Helmholtz  equation  (71),  which  for  radiation  becomes 


3U  3E  3P 

— -  -  E  +  V  — —  ■  T  — —  -  P 
3V  r  3V  3T  r 


(72) 


Introducing  a  radiation  Lagrange  multiplier  nr  as  follows 


r  3E  3P  , 

E  +  V  rrr-  +  P  -  T  •—  “  0 

r  l  r  3V  r  3T  J 


(73) 


allows  equation  (70)  to  be  separated  as  follows 


(t  “  +  f  +  M  )  E  -  B(T|^-p)-^a 
\  3T  r  3v  r  /  r  r\  3T  J  r 


(74) 
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(75) 


where  v  is  defined  in  equation  (27),  and  where 
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Equations  (72),  (74),  and  (75)  are  coupled  radiation  equations  that  give  Ef 
Pr  ,  and  nr  .  These  equations  simplify  somewhat  for  solids  and  quantum 
liquids.7 
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The  potential  form  of  the  renormalization  group  equations  for  radiation 
will  now  be  obtained  by  a  perturbation  procedure  that  Is  applied  to  equations 
(44)  through  (49) .  When  excitations  are  present  the  renormalization  group 
equations  (44)  through  (47)  become 


(  +  a  \  T  / 11  .  V  _V_/3i  _^r\ 
lY  V(<fr  +  $f)  \  3T  3T  /  "  (♦  +  <Pr)  \  9V  9V  ) 

P  +  Pr  -  T(  It  +  w)  *  <P  +  Pr>T(H  +  W-) 

(Y  *  V  w  l  »r>(lr +  ir )  -  (*  r*;>  (lv +  w) 
P  +  Pr-T(f +  ir)-  <p  +  VT(lf  +  ir) 


(81) 


(82) 


T  IH  .  Wr\  ,  .  V  /jj  ^r\ 

(♦  +  ♦  )  \  3T  3T  /  V  (< |»  +  ¥  )  \  3V  +  3V  / 

E  +  Er  +  V(  If  +  1\r)  '  <E+  Er)V(lv +'SV£) 


(83) 


<♦  l  *r)  (If  +  ir)  -  (b  +  6r>  «TTTT)-(lv  +  3V1) 

E  +  Er+V(lf  +  WI)+<E+Er>V(lf  +  ^) 


(84) 


Expanding  equations  (81)  through  (84),  keeping  only  first  order  terms,  and 
subtracting  equations  (44)  through  (47)  gives 


(85) 


(86) 


(86a) 
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(87) 


3Pr  a* 

B  -  p  -  T  — +  p  T  — +  PT 
r  r  3T  r  3T 


d<f> 

1 

3T 


3P  3<t> 

Br  "  Pr  *  T  ST  -  PrT  W  "  PT  3T 


and 


/  dM"  .  /«!>.  f  dEr  .  Cr  ~(ll)  Dr 
\dF"/  \dt/  dE  E  +  V  ||  -  EV  ■ 


3i|> 

3V 


(d6rf  /db\+dEr,  Cr-(3F)+Br 

*(*)  <r  E  +  v||  +  Ev  H 


3P  9E 

dE  E  +  P  -  T  -T--  +  -T~-  n  ~ 
_ r  r  r _ 3T  3T  dn 

dE  E  4.  p  -  t  —  +  — n 

3T  3T  dn 


l!!£  +  !r/b  vil  _Iit\  .I(b!!r+  \ 

^  3T  t|)  \  i|(  3V  ip  3T  /  ^  \  3V  Pr  3V  / 


3E  3^ 

E  +  v _ -  -  E  V  ^  -  EV  — — 

r  3V  r  3V  3V 


3E  W 

D+  -  E  +  V  — —  +  E  V  ^  +  EV  — - 
r  r  3V  r  3V  3V 


(88) 


(89) 


(90) 


(91) 


(91a) 


(92) 


(93) 


(94) 


where  (dy/dP)  and  (dy/dP)+  are  given  by  equations  (44)  and  (46),  while 
(db/dE)~  and  (db/dE)+  are  given  by  equations  (45)  and  (47). 

The  symmetric  equations  that  describe  the  variation  of  the  radiation 
gauge  parameters  are  then  given  by 


325 


(95) 


(96) 


Equations  ( 9 S y  and  (96)  are  the  potential  forms  of  the  renormalization  group 
equations  for  a  thermodynamic  system  that  contains  radiation.  These  equations 
determine  the  radiation  potentials  <j>r  and  i^r  .  The  potential  function  <J>r  is 
related  to  the  radiative  change  in  the  Debye  temperature  0j)j.  by  <J>r  *=  0j)r/T 
for  high  temperatures  and  by  <pr  -  T/0Dr  for  low  temperatures.  The  derivatives 
on  the  left  side  of  equations  (95)  and  (96)  can  be  evaluated  using  equations 
(65)  and  (68)  and  the  following  simple  relationships 


d<5 

_ i 

dP 


n 


36  96 

_ r  _ r  dT 

3n  3T  n  dn 


v  +  IZ.  „  dT 

K-,  t  ^  n 


3T  dn 


d6r 


36  96 

r  _ r  dT 

3n  3T  n  dn 


E  +  P  -  T 


3P  W  dT 
3T  3T  "  dn 


(97) 


(98) 


5.  WAVES  IN  SOLIDS  AND  QUANTUM  LIQUIDS.  Excitations  in  relativistic 
solids  and  quantum  liquids  have  already  been  considered  using  some  simplify¬ 
ing  assumptions.7  A  general  procedure  for  calculating  the  amplitude  and 
spectrum  of  relativistic  waves  in  solids  and  quantum  liquids  will  be  out¬ 
lined  here.  The  energy  density  and  pressure  for  radiation  in  these  systems 
is  written  as 


Ea  -  Ea  +  Ea  TJ  +  •  •  • 
r  or  jr 


Pa  +  Pa  T^  +  ••• 
or  jr 


and 


E  ■  E  +  E  T^  +  •  •  • 

r  or  jr 


P  -  P  +  p,  t^  +  ••• 
r  or  jr 


(99) 

(100) 


(101) 

(102) 
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where 


Ea 

and 

Pa 

or 

or 

Ea 

and 

Pa 

jr 

jr 

E 

and 

P 

or 

or 

e4 

and 

PJ 

Jr 

jr 

nonrelativistic  zero-temperature  radiation  energy 
density  and  pressure  respectively 

nonrelativistic  thermal  coefficients  for  the 
radiation  energy  density  and  pressure  respectively 

relativistic  zero-temperature  radiation  energy 
density  and  pressure  respectively 

relativistic  thermal  coefficients  for  the  radiation 
energy  density  and  pressure  respectively 


The  zero  temperature  value  of  the  radiation  Grlineisen  parameter  is  obtained 
from  equations  (66)  and  (99)  through  (102)  to  be 


Pa  P, 

a  T  .J£ 

or  fa  'or  E. 


(103) 


The  zero  temperature  values  of  the  nonrelativistic  and  relativistic  radiation 

bulk  modulus  is  written  as  Ka  -  ndPa  /dn  and  K  ■  ndP  /dn  respectively. 

or  or  or  or  r 

The  basic  relativistic  equations  describing  excitations  in  solids  and 
quantum  liquids  are  written  as  (equations  76  and  77  of  Reference  7) 


E  -  3[(1  +  y  )P  -  R  ]  -  3  P  (Y  -y)«Ea  (104) 

or  '  'o'  or  or  E^  o' 'or  'o'  or 

jE.  (aK  -  6P  )  +  E.  S.  -  jE^faV  -  0aPa  I  +  E®  T?  (105) 

J  j'  or  or'  Jr  jr  J  j\  or  or /  jr  jr 

where 


Jr 


.  .  .  .  jVor  .  , 

1  *  1  +  T  -  K  +3ndn 
o  o 


or  -  3U-l)(Yor  -  Y0)2 


(106) 


1  +  j 


+ 


,na  a 

jP  y 

J  o  or 


(107) 
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and  where 


Y  P 
'o  o 


<p0  -  Ko>‘ 


o  - 


a_a 

Y  p 
o  o 


(pa  -  K*) 
o  o 


(108) 


0  - 


Y  K 
'o  o 


<Po  -  V 


Y*Ka 

0a  -  00 


<pa  -  Kar 
'  o  o' 


(109) 


Equations  (104)  and  (105)  can  be  deduced  directly  from  equation  (70)  by 
using  equations  (11)  and  (101).  For  example,  the  expression  for  6r  that 
appears  In  equation  (65)  can  be  evaluated  for  the  zero  temperature  case  of 
solids  and  quantum  liquids  to  be 


Sor  -  ^  <Yor  '  Yo> 


(110) 


Using  the  following  basic  relationships 


dEor  F 

Por  *  n  —  “  Eor 


dll) 


dP  0  d2E 

K  »  n  °r  -  n  — 
or  dn  dn2 


(112) 


allows  equations  (104)  and  (105)  to  be  written  as 


3n 


2  TT-  -  3(1  +  Yo)n  +  (3yo  +  4)Eor  +  3  if  Po(Yo  -  Yor>  '  E 
dn  j 

(113) 


a 

or 


.  d*E  dE  E4  S4 

.2  or  or  .  „r  .  1r  it 

an  — z - Bn  -j- —  +  0c  +  4p- 

dn2  dn  0t 


(114) 


Eaf  ,  d2Ea  .  dE*  ,1 
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The  quantities  Sjr  and  Tjr  are  functions  of  Yor  and  Ygr  »  while  the  ratios 

Ejr/Ej  ,  Ej /Ej  ,  and  Ear/Ej  are  functions  of  Yor  *  Yar  »  Y0  and  y§  • 7 

Equations  (113)  and  (114)  are  simultaneous  second  order  differential  equations 
that  determine  Eor  and  Y0r  *n  terms  of  E®r  ,  Yar  »  and  in  terms  of  the  para¬ 
meters  of  the  ground  state. 

Equations  (113)  and  (114)  must  be  solved  simultaneously  to  obtain  the 
relativistic  wave  amplitude  and  the  relativistic  wave  number  in  terms  of  the 
corresponding  nonrelativistic  values.  However,  in  order  to  obtain  an  approx¬ 
imate  value  for  the  relativistic  wave  amplitude  and  wave  number,  only  equation 
(113)  will  be  used.  The  zero  temperature  values  of  the  nonrelativistic  and 
relativistic  radiation  energy  densities  are  respectively  written  as10 

Ea  =  ~~  Kak2A2  (115) 

or  4  o  a  a 


E  *  — r—  K  k2A2  (116) 

or  4  o 

where  ka  and  Aa  =  nonrelativistic  wave  number  and  wave  amplitude  respectively, 
and  k  and  A  *  relativistic  wave  number  and  wave  amplitude  respectively. 

Placing  equations  (115)  and  (116)  into  equation  (113)  gives 


3  v  2d  (,2.2s  .  3 

IV  7T(k  A  >+4 

dn 


dK 

2n  *T  -  +  VKo 


d  fi2A2\ 

n  dn  ( k  A  ) 


(117) 


.  1  ,2.2 
>  —  k  A 
4 


'  d2K  dK 

3“  “T  ’3(1  +  Y0)n35a+  (3yo  +  4>*0 

dn 


1  iA2va 
+  g  ■  -r  k  A  K 
4  a  a  o 


where 

8  =  3  E^  VYo  ’  Yor)  (118) 

As  a  crude  approximation  take  yor  »  1/3  ,  Ej r/Ej  -  EQr/E0  ,  and  PQ  n°°  where 
ao  *  adiabatic  index,  and  assume  kA  Is  not  explicitly  density  dependent,  and  get 


g  a,  3E  (o 

or  o 


1)(Yo  '  V 


(119) 


and 


i  2.2 

K  A 


3n 


2  «2ko 


dK 


dn 


« - 3(1  +  Y„)n  —  +  (3o  y  -  o  +  5)K 

2  o  dn  o  o  o  < 


1  -  AV 

)  a  a  c 


(120) 
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Using  Kq  'v  n°°  in  equation  (120)  gives 


kaA2 


k2A2Ka/K 
a  a  o  o 

3o2  -  7a  +  5 
o  o 


(121) 


.2A2 
~  k  A 
a  a 


3a  -  3o  (2  +  y  )  +  3y 
o  o  o  o 


3a  -  7a  +  5 
o  o 


-) 


The  relative  values  of  kA  and  k_Aa  at  low  and  high  densities  depend  on 


the  values  of  aQ  and  yQ  at  these  densities.  For  a  low  density  Fermi  gas  where 
a0  *  5/3  and  yD  =  2/3  one  has  kA  “  0.77  kaAa 
yQ  'v  3.83  the  result  is  kA  *  0.69  kaAa 


^ana  ,  for  a  solid  where  oQ  'v  8  and 
The  high  density  limit  of  equation 
(121)  is  somewhat  more  delicate.  If  the  high  density  limit  is  associated  with 
asymptotic  freedom,  then  oQ  =  4/3  and  y0  =  1/3  and  kA  =  kaAa  .  On  the  other 
hand,  if  at  high  densities  the  interactions  increase  without  limit  and  o0  -►  °°  , 
but  with  y0  ■  constant,  then  kA  -  kaAa  .  However,  yQ  is  probably  a  function  of 
aQ  and^may  be  written  as  yQ  -  oQ  -  4/3. 8  In  this  case  equation  (121)  goes  as 


w  »  *  yjyj 

a0/(3a^)  as  an  +  “  so  that  kA/(kaAa)  0  .  This  behaviour  contrasts  with  the 


results  of  the  first  order  radiation  differential  equation  approximation  of 
Reference  7,  where  kA/(kaAa)  •+■  3a2/aQ  and  is  large  for  0o  -*■  “  with  yQ  =  con¬ 
stant  while  kA/(kaAa)  -*•  a0/aQ  *  1  for  oQ  -*■  °°  with  y0  -  a  -  4/3  .  Finally  for 
the  case  of  asymptotic  freedom  with  oQ  m  4/3  and  yQ  ■  1/3  the  first  order  dif¬ 
ferential  equation  approximation  gives  kA  -  0.65  kaAa  .  Thus  the  effect  of 
relativistic  thermodynamics  on  wave  amplitudes  is  system  and  model  dependent. 


6.  RELATIVISTIC  PHASE  VELOCITY.  A  general  procedure  is  given  for 
determining  the  relativistic  phase  velocity  for  waves  in  solids  and  quantum 
liquids.  The  procedure  will  be  first  to  determine  and  ygr  from  the 
values  of  the  nonrelativistic  sound  speed,  then  to  solve  for  the  relativistic 
quantities  Eor  and  yor  using  quations  (113)  and  (114),  and  then  finally  work¬ 
ing  backward  to  determine  the  relativistic  sound  speed  from  the  relativistic 
energy  density  and  Grtlneisen  parameter  for  the  radiation. 

The  nonrelativistic  expression  for  the  phase  velocity  of  mechanical 
waves  is  given  by8 


vf  +  YaT  El 

+  Y  T  9T 

£a  +  ya0a 


(122) 


where 


Z 


a 


Ea  +  Pa 


(123) 
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(124) 


G 


a 


where  W  =  nonrelativistic  sound  speed,  and  c  ■  light  speed.  The  zero 
temperature  limit  of  equation  (122)  is7*8 


( 


Ea  +  Pa  +  Ka 
o  o  o 


(125) 


where  Wa 
o 


sound  speed  in  a  T  ■  0  solid  or  quantum  liquid. 


The  nonrelativistic  diffuse  radiation  factor  is  defined  to  be 


r 


a 

r 


(126) 


For  isotropic  radiation  the  diffuse  radiation  factor  can  be  expressed  as 
follows10 


ra  .  I  +  JL  dw& 
r  3  „a  dn 
w 


(127) 


The  phase  velocity  that  appears  in  equation  (122)  through  (124)  can  be 
expanded  in  powers  of  the  temperature,  so  that  the  diffuse  radiation  factor 
can  be  written  in  the  following  general  form 

ra  «  ra  +  ra  t^  +  •••  (128) 

r  or  jr 

Therefore  the  coefficients  of  the  diffuse  radiation  are  expressed,  through 
the  phase  velocity  Wa  ,  in  terms  of  the  ground  state  functions  Eg  ,  P®  ,  E§  , 
and  . 

Using  equations  (99)  and  (100)  to  represent  the  radiation  pressure  and 
energy  density  that  appear  in  the  defining  equation  (126)  for  the  diffuse 
radiation  factor,  expanding  the  denominator,  and  keeping  only  first  order 
terms  yields  the  following  results 


331 


(129) 


or 


(130) 


where  the  left  hand  side  of  these  equations  are  obtained  from  the  sound  speed. 

Equations  (129)  and  (130)  can  be  used  to  determine  E|r  ,  E|r  ,  and  ygr  . 
For  instance,  placing  equation  (111)  into  equation  (129)  gives  J 


dEa 

pa  _  _n _ or 

or  dn 

or 


1 


(131) 


which  is  a  differential  equation  that  can  be  solved  for  Egr  ,  since  r®r  is 
known  from  the  ground  state  parameters  through  equations  (122)  and  (127).  In 
fact,  the  solution  of  equation  (131)  is 


where  Dgr  *  constant.  The  determination  of  E?r  and  y®r  from  equation  (130) 
goes  as  follows.  It  is  easily  shown  that7 


(133) 


where  D§r  ■  constant.  Placing  equation  (133)  into  equation  (130)  gives  an 
integral  equation  which  can  be  solved  for  ygr  .  In  this  way  the  nonrelativ- 
istic  radiation  energy  density  and  Grtlneisen  parameter,  E|r  and  y|r  respec¬ 
tively,  can  be  determined  from  the  phase  velocity. 

The  corresponding  relativistic  values  of  the  radiation  energy  density 
Eor  and  GrUiieisen  parameter  y0_  are  obtained  by  the  solution  of  the  simulta¬ 
neous  equations  (113)  and  (114).  The  relativistic  thermal  energy  density  co¬ 
efficient  is  then  determined  by7 
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(134) 


nDjr  exp[-  (J-l)  J  ror 


where  D  ■  constant.  The  relativistic  diffuse  radiation  factor  coefficients 
are  then  calculated  by 


r 

or 


-  1 


(135) 


jr  "  (Yor  '  ror) 
or 


(136) 


The  relativistic  diffuse  radiation  factor  is  then  written  as 


r  (n,T)  «  r  +  r,  tj  +  ••• 
r'  or  jr 


(137) 


P 

r 

T 


Finally  the  relativistic  phase  velocity  is  obtained  as  a  solution  to  the 
following  equation 


r  ,  I  +  H  dW 
r  3  W  dn 


(138) 


which  can  be  written  as 


W 

“  “  exp 


(139) 


If  it  is  assumed  that  the  diffuse  radiation  factor  is  Independent  of 
temperature  it  follows  from  equations  (136)  and  (137)  that 


r  »  r  *  y 

r  or  'or 


(140) 
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In  this  case  it  follows  that  Por  ■  Yor^or  >  and  the  wave  equations  (104)  and 
(105)  reduce  to  a  set  of  coupled  first  order  differential  equations  instead 
of  the  second  order  differential  equations  that  appear  in  equations  (113)  and 
(114). 7 

7.  SCALING  THE  WAVE  AMPLITUDE.  This  section  obtains  a  simple  expression 
for  the  wave  amplitude  in  a  T  ■  0  system.  Combining  equation  (135)  with  the 
T  ■  0  form  of  equation  (138)  yields 


dn 


.  dW 

_n _ o 

3  +  W  dn 
o 


(141) 


where  WQ  ■  zero  temperature  value  of  the  relativistic  phase  velocity.  An 
equation  analogous  to  (141)  holds  for  the  corresponding  nonrelativistic 
quantities.  The  solution  of  equation  (141)  is  easily  obtained  to  be 


E 

or 


G  W  n 
r  o 


4/3 


(142) 


where  Gr  ■  constant  independent  of  n.  The  relationship  between  wave  number 
and  phase  velocity  is  WQ  =  <j/k  ,  where  <o  ■  angular  frequency.  Using  this  in 
equation  (116)  gives  the  following  expression  for  the  radiation  energy  density 


E 

or 


1 

4 


2.2., 
u  A  K 


(143) 


Combining  equation  (142)  and  (143)  gives  the  wave  amplitude  as 


40  WV/3 
r  o 

uj2K 


(144) 


Let  n  and  n^  be  two  particle  number  densities,  then  it  follows  from  equation 
(144)  that 


Aial]2  . 

rv->i3/ 

n  f  W 

A(nx) 

IwJ  V 

hi  Ko<”> 

(145) 
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which  is  the  scaling  equation  for  wave  amplitudes  under  a  change  in  density 
in  a  T  =  0  system.  Equation  (145)  is  expected  to  be  a  good  approximation 
for  finite  temperature  systems  if  the  corresponding  finite  temperature  para¬ 
meters  are  used. 


8.  ELECTROMAGNETIC  WAVES  IN  MATTER.  The  relativistic  calculation  of 
the  energy  density  and  phase  velocity  of  electromagnetic  waves  in  matter  pro¬ 
ceeds  in  a  manner  analogous  to  the  case  of  mechanical  waves.  The  relativistic 
and  nonrelativistic  electromagnetic  energy  densities  at  zero  temperature  are 

given  by 


Eor  ’  7  KE2  +  “o"2'  (I46> 

Ea  -  \  (eaE2  +  uaH2)  (147) 

or  2  '  o  a  o  a' 


where  E  and  H  =  relativistic  electric  and  magnetic  radiation  fields  respect¬ 
ively;  Ea  and  Ha  *  nonrelativistic  electric  and  magnetic  radiation  fields  re¬ 
spectively;  eQ  ,  pQ  and  e®  ,  y®  ■  zero  temperature  values  of  the  relativistic 
and  nonrelativistic  permittivities  and  permeabilities  respectively.  The  ther¬ 
mal  part  of  the  radiation  energy  density  and  pressure  is  written  in  the  form 
of  equations  (101)  through  (103),  and  therefore  the  determination  of  EQr  and 
yor  is  necessary  for  a  relativistic  description  of  electromagnetic  waves  in 
matter.  The  crude  approximation  yQT  *  1/3  is  made  in  this  section,  and  the 
problem  is  to  determine  eQ  ,  ,  E,  and  H  . 

It  will  be  assumed  that  the  nonrelativistic  values  of  the  zero  tempera¬ 
ture  values  of  the  permittivity  and  permeability  can  be  represented  by  some 
theoretical  expressions  in  terms  of  the  density,  pressure  and  Grtlneisen  para¬ 
meters  as  follows 


ea  =  X[n,Pa(n) ,ya(n)] 

(148) 

ua  -  Y[n,Pa(n) ,Ya(n)] 

(149) 

Then  the  relativistic  values  c0  and  y0  are  determined  using 
tional  relationships  but  now  evaluated  for  the  relativistic 
pressure  and  Grtlneisen  parameter  as  follows 

the  same  func- 
values  of  the 

=  x[n,PQ(n) ,YQ(n)] 

• 

(150) 

yQ  -  Y[n,PQ(n) , Yq (n) ] 

(151) 

The  relativistic  values  of  PQ  and  yQ  are  obtained  from  the  solution  of  the 
simultaneous  equations  (14)  and  (15).  Thus  the  relativistic  values  of  eQ  and 
y0  are  obtained  indirectly  from  the  ground  state  solution  of  the  relativistic 
trace  equation  (1). 
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A  complete  relativistic  thermodynamic  description  of  electromagnetic 
waves  in  matter  requires  the  determination  of  EQr  and  yQr  by  the  simultaneous 
solution  of  equations  (113)  and  (114).  But  for  an  approximate  solution  only 
equation  (113)  can  be  used  with  yor  -  1/3  .  Placing  equation  (146)  and  (147) 
into  equation  (113)  gives  the  following  differential  equations  for  E  and  H 


|sn^(E2)+3  [2n^o.  a+VCo]n^(E2) 


(152) 


d2 

+  7  e2[3”2  -r  -  3(1  +  V”  3^  +  +  4>Eo]  +  8E  -  7  #£ 

dn 


2  Mon 


j2  -  .  dyo 

^(H)  +  fC2n3r 


-  <1  + > 


(153) 


+  I  "2C3"2  7T  -  3(1  +  V"  3T  +  <*o  +  «>».J  +  %  *  7 

an 

where  gE  and  gH  are  obtained  from  equations  (118)  and  (119)  to  be  given  ap¬ 
proximately  as 


8e  f  eoe2(oo  -  »<».  -  i> 


(154) 


SB  |  P0H2(oo  -  0(Yo  -  j) 


(155) 


Combining  equations  (152)  through  (155)  yields  the  following  equations  for 
the  relativistic  values  of  the  electric  and  magnetic  fields  in  matter  as¬ 
suming  that  E  and  H  are  not  explicitly  density  dependent 


.2  . 

9  9  d  £  de  9 

E  [3n  — = - 3(1  +  y  )n  -z —  +  (3o  y  -  o  +  5)e  ]  -  e  E 

L  ,2  '  'o'  dn  o'o  o  '  oJ  oa 

dn  . 


(156) 


d2  du 

H2[3n2  — -  3(1  +  yQ)n  +  (3o„y„  -  an  +  5)yJ  -  yV 
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o'o 


o  a 


(157) 
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Assuming  e0  np°  ,  pQ  *  nV°  ,  and  PQ  ^  n°°  in  equations  (156)  and  (157) 
gives  the  following  approximate  equations 
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a  +  3(o  -  P  )y  +5 

o  o  o  o 


(158) 
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(162) 


The  determination  of  the  relativistic  phase  velocity  for  electromagnetic 
waves  in  matter  proceeds  in  a  manner  similar  to  that  for  the  case  of  mechan¬ 
ical  waves  that  was  treated  in  equations  (122)  through  (139)  except  that  the 
temperature  dependent  nonrelativist ic  phase  velocity  is  given  by 


(163) 


a  a 

where  e  and  p  -  nonrelativistic  permittivity  and  permeability  respectively. 
The  zero  temperature  limit  of  equation  (163)  is  written  as 


/  a  a^-1 

(Vo) 


(164) 
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From  the  phase  velocities  given  in  equations  (163)  and  (164)  one  can  calculate 
the  nonrelativistic  radiation  energy  density  and  Grtineisen  parameters,  Egr 
and  y§r  respectively,  by  the  procedure  outlined  in  equations  (126)  through 
(133).  Then  the  solution  of  equations  (113)  and  (114)  yields  the  corresponding 
relativistic  radiation  energy  density  and  GrUneisen  parameter,  Eor  and  Y0r  re¬ 
spectively.  From  Eor  and  Y0r  one  obtains  an  estimate  of  the  relativistic  dif¬ 
fuse  radiation  factor  Tr  by  the  procedure  outlined  in  equations  (134)  through 
(138).  Finally  the  relativistic  phase  velocity  for  electromagnetic  waves  in 
matter  is  given  by 


W 


c 


(165) 


If  it  is  assumed  that  the  diffuse  radiation  factor  is  independent  of  temper¬ 
ature  the  substitution  Fr  -  Y0r  can  be  made  in  equation  (165). 

9.  CONCLUSION.  The  trace  equation  of  the  relativistic  thermodynamic 
ground  state  is  reduced  to  two  Callan-Symanzik  type  renormalization  group 
equations  that  connect  the  matter  fields  E  and  P  with  the  thermodynamic  gauge 
fields  y  and  b  .  The  gauge  parameters  are  necessary  to  insure  that  the  trace 
equation  is  invariant  under  a  local  scale  transformation.  The  assumption  of 
local  scale  invariance  under  changes  of  the  correlation  length  of  the  system 
leads  in  a  natural  way  to  a  set  of  differential  equations  for  the  gauge  para¬ 
meters.  These  are  the  potential  forms  of  the  renormalization  group  equations 
for  the  ground  state.  The  renormalization  group  equations  for  radiation  in 
matter  can  be  written  in  terms  of  radiation  potentials  or  in  the  form  of 
radiative  Callan-Symanzik  equations.  The  radiation  equations  for  a  general 
thermodynamic  system  are  applied  to  waves  in  solids  and  quantum  liquids,  and 
a  set  of  coupled  second  order  differential  equations  are  developed  that  de¬ 
termine  the  relativistic  radiation  energy  density  and  Grllneisen  parameter. 
Finally,  a  simple  scaling  relation  is  developed  for  the  amplitude  of  waves 
propagating  through  materials  of  different  density. 

No  mass  or  energy  scale  occurs  in  the  equations  of  relativistic  thermo¬ 
dynamics,  but  the  temperature  and  volume  scales  that  appear  in  these  equa¬ 
tions  is  similar  to  the  mass  cutoff  parameter  that  appears  in  the  Callan- 
Symanzik  equations  of  quantum  field  theory.1  Therefore  in  analogy  to  the 
dimensional  transmutation  of  Coleman  and  Weinberg  there  may  appear  a  mass 
associated  with  the  gauge  bosons  that  correspond  with  the  gauge  parameters 
Yr  and  br  ,12  On  the  other  hand,  the  ground  state  of  a  relativistic  thermo¬ 
dynamic  system  may  exhibit  a  broken  symmetry  in  which  case  the  gauge  bosons 
can  become  massive  by  the  Higgs  mechanism.1  In  either  case  massive  thermal 
gauge  bosons  should  exist  that  are  associated  with  the  thermodynamic  gauge 
parameters  Yr  and  br  .  The  gauge  boson  associated  with  the  Grllneisen  para¬ 
meter  should  exist  even  for  T  ■  0  solids  or  quantum  liquids.  Therefore  new 
physical  phenomena  are  expected  to  occur  in  bulk  matter  that  is  subjected  to 
high  pressures.  In  addition,  the  results  of  this  paper  should  have  engineer¬ 
ing  and  geophysics  applications. 
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RELATIVISTIC  WAVE  EQUATIONS  FOR  REAL  GASES 
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ABSTRACT .  The  relativistic  wave  equation  for  a  generalized  thermo¬ 
dynamic  system  is  developed.  The  solution  of  this  equation  is  obtained 
for  the  real  gases  whose  pressure  is  described  by  a  virial  expansion.  A 
procedure  is  given  for  calculating  the  relativistic  amplitude  and  phase 
velocity  for  mechanical  waves  propagating  in  real  gases.  The  relativistic 
wave  amplitude  is  calculated  by  a  virial  expansion  whose  coefficients  are 
determined  from  the  wave  equation.  The  relativistic  effects  on  wave  propa¬ 
gation  in  gases  manifest  themselves  only  through  the  third  virial  coeffi¬ 
cient,  and  therefore  these  effects  are  expected  to  be  observed  only  at  high 
pressures  such  as  found  in  atmospheric  nuclear  explosions,  the  interaction 
of  directed  energy  beams  with  the  atmosphere,  stellar  atmospheres,  and  in 
high-pressure-physics  laboratory  experiments.  The  effects  of  curvature 
waves  in  spacetime  on  the  pressure  of  real  gases  are  also  considered,  and 
applications  to  the  detection  of  gravitational  radiation  are  suggested. 

1 .  INTRODUCTION .  Local  gauge  (scale)  invariance  plays  a  fundamental 
role  in  the  description  of  diverse  physical  phenomena.1-3  The  requirement 
of  local  scale  invariance  suggests  that  relativistic  thermodynamics  can  be 
formulated  on  the  basis  of  a  relativistic  trace  equation  that  relates  the 
pressure  and  internal  energy  fields  to  a  set  of  gauge  parameters . 4  * 5  The 
trace  equation  for  a  thermodynamic  system  can  be  written  as  a  partial  dif¬ 
ferential  equation  involving  the  energy  density,  pressure,  and  two  gauge 
parameters.  The  scale  transformations  refer  to  changes  in  the  correlation 
length  of  the  system,  and  the  scale  invariance  establishes  a  correspondence 
between  different  physical  states  of  a  relativistic  thermodynamic  system. 
This  correspondence  is  encompassed  by  the  renormalization  group  differential 
equations  that  describe  the  variation  of  the  gauge  parameters  with  the 
magnitude  of  ambient  matter  fields  such  as  pressure  and  energy  density. 

For  the  case  where  the  thermodynamic  system  has  a  well  defined  zero 
temperature  state,  such  as  is  the  case  for  solids  and  quantum  liquids,  the 
trace  equation  leads  to  a  set  of  coupled  second  order  differential  equations 
for  the  simultaneous  determination  of  the  zero  temperature  values  of  the 
pressure  and  Grtineisen  parameter.4  For  real  gases  whose  pressure  is  described 
by  a  virial  expansion,  the  trace  equation  yields  a  relativistic  expression 
for  the  third  virial  coefficient.4  This  paper  derives  the  relativistic 
equation  for  radiation  in  a  generalized  thermodynamic  system,  and  then  de¬ 
rives  the  equations  that  are  necessary  to  calculate  the  wave  amplitudes 
and  phase  velocity  for  waves  in  real  gases.  This  is  done  by  a  perturbation 
procedure  that  is  applied  to  the  basic  trace  equation  !  hat  describes  the 
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ground  state  o£  a  relativistic  thermodynamic  system. 


The  trace  equation  of  relativistic  thermodynamics  is  written  as4 


“  +  T(§)PV-3V3>>B  -  “a  + 


where  U  *  relativistic  internal  energy,  P  ■  relativistic  pressure,  T  -  ab¬ 
solute  temperature,  V  ■  volume  of  substance,  and  Ua  and  Pa  ■  corresponding 
nonrelativistic  internal  energy  and  pressure.  Throughout  this  paper  the 
index  "a"  will  refer  to  nonrelativistic  calculations.  It  is  easy  to  show 
that  equation  (1)  can  be  written  as  follows4 

&<TU>  '  bvf  -  3v[^<PV)  -  yf£)  (2) 

-■i(TU“)  -b*vf£ 


where 


Y  ■-(—) 

V3T4 


(3) 


TOP/3T) 

b "  (p  -  v 


(4) 


a  T(3Pa/3T)y 
(Pa  -  K^) 


(5) 


and  where  y  ■  relativistic  GrUneisen  parameter,  Cy  -  relativistic  heat  ca¬ 
pacity  at  constant  volume,  and  where 


"t 


(6) 
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are  Che  relaciviscic  and  nonrelativlstic  values  of  the  bulk  modulus  respec¬ 
tively.  The  nonrelativlstic  GrUneisen  parameter  is  defined  as  follows 


where  Cy  ■  nonrelativlstic  heat  capacity  and  constant  volume.  Equation  (2) 
can  be  rewritten  in  terms  of  the  energy  density  as  follows5 


(l-b+T£-bv£)E-j(l  +  T+V&-YT&)p  (9) 

where  E  -  U/V  -  relativistic  energy  denslty»  and  E  -  U  /V  -  nonrelativlstic 
energy  density.  The  parameters  y  and  b  are  the  two  gauge  parameters  of  rel¬ 
ativistic  thermodynamics.5 

Wave  motion  in  relativistic  gases  can  be  of  two  types.  The  first 
corresponds  to  mechanical  vibrations  of  the  gas  which  results  in  pressure 
changes  in  time  and  space.  This  type  of  wave  motion  is  described  by  a  rel¬ 
ativistic  wave  equation  for  real  gases.  Such  an  equation  can  be  developed 
by  first  considering  relativistic  waves  in  a  completely  general  thermodynamic 
medium  and  then  specializing  to  the  case  of  real  gases.  It  is  required  to 
find  both  the  relativistic  amplitude  and  sound  speed  for  waves  in  real  gases. 
In  order  to  do  this  the  nonrelativlstic  wave  amplitude  and  phase  velocity 
must  first  be  determined.  The  relativistic  effects  appear  only  in  the  third 
and  higher  virial  coefficients  of  the  real  gas  state  equation.  Therefore 
it  is  necessary  to  solve  the  relativistic  radiation  equation  for  real  gases 
to  determine  the  relativistic  value  of  the  third  virial  coefficient  for  radi¬ 
ation  in  the  real  gas  system.  The  relativistic  diffuse  radiation  factor  and 
the  relativistic  sound  speed  in  real  gases  are  then  determined  from  the 
values  of  the  relativistic  third  virial  coefficient  for  radiation.  These 
effects  should  be  important  only  for  real  gases  at  high  pressures  where  the 
third  virial  coefficient  contributes  significantly  to  the  equation  of  state. 

A  second  kind  of  wave  motion  in  gases  can  be  induced  by  the  coupling  of 
the  wave  motions  in  spacetime  with  some  characteristic  parameter  of  real 
gases.  The  wave  motions  in  spacetime  are  gravitational  waves.  The  attempts 
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at  detecting  gravitational  radiation  over  the  past  twenty  years  by  various 
methods  including  the  use  of  solid  body  resonance  detectors  have  not  been 
successful. 6-8  This  may  be  due  to  the  lack  of  adequate  sensitivity  of  present 
day  detectors  because  the  cosmic  sources  of  gravity  waves  are  thought  to  be 
very  weak.9-11  On  the  other  hand  the  lack  of  positive  experimental  results 
using  solid  body  detectors  may  be  due  to  a  basic  insensitivity  of  this  type 
of  detector,  and  it  has  been  suggested  that  real  gases  and  liquids  may  be 
better  suited  for  a  detector  material  because  the  third  virial  coefficient 
is  expected  to  be  sensitive  to  changes  in  the  metric  of  spacetime.12  This 
paper  calculates  the  adiabatic  changes  in  gas  pressure  that  are  expected  to 
occur  in  a  detector  that  is  subjected  to  the  tidal  effects  of  gravity  waves. 

The  procedure  followed  in  this  paper  is  to:  a)  review  the  theory  of  the 
relativistic  ground  state  of  real  gases,  b)  determine  the  equations  that 
describe  relativistic  waves  in  a  generalized  thermodynamic  medium,  c)  develop 
a  simple  nonrelativistic  calculation  of  the  amplitude  of  waves  in  real  gases, 
d)  determine  the  solution  of  the  wave  equation  for  real  gases  by  performing 
a  perturbation  calculation  on  the  relativistic  ground  state  equation  for  real 
gases,  e)  determine  the  relativistic  values  of  the  wave  amplitude  and  phase 
velocity,  f)  determine  the  adiabatic  changes  of  pressure,  volume,  and  temper¬ 
ature  for  a  real  gas  that  is  interacting  with  gravitational  radiation. 

2.  RELATIVISTIC  GROUND  STATE  OF  REAL  GASES.  The  form  of  the  solution 
of  the  trace  equation  (1)  depends  on  the  type  of  physical  system  being  con¬ 
sidered.  For  real  gases  the  nonrelativistic  and  relativistic  pressure,  energy 
density,  bulk  modulus,  and  molar  heat  capacity  (specific  heat)  are  written  in 
virial  form  respectively  as13*14 


Pa  -  nRaT[l  +  nBa(T)  +  n2Ca(T)  +  •••] 
Ea.nRaT[|.„T|B!.i„2lfc!....] 
-  nR^l  +  2nBa(T)  +  3n2Ca(T)  +  •••  ] 
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nl  1  ty  * 

3T 
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'  2  "  V 

,T2  +.  31  )  J 
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and 


P  -  nRT[l  +  nB(T)  +  n2C(T)  +  •••] 


(14) 
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E  -  nRl[|  -  nT|i  -  i  n2T|| - ] 

Kp  -  nRT[l  +  2nB(T)  +  3n2C(T)  +  •••] 


(15) 

(16) 


where 


n  ■  N/V  -  1/V 


(18) 


where  N  -  number  of  moles,  V  ■  molar  volume;  Ra,  Ba(T)  and  Ca(T)  ■  nonrela- 
tivistic  values  of  the  gas  constant,  second  virial  coefficient,  and  third 
virial  coefficient  respectively;  R,  B(T),  and  C(T)  -  relativistic  values  of 
the  gas  constant,  second  virial  coefficient,  and  third  virial  coefficient 
respectively;  and  C®  and  Cy  ■  nonrelativistic  and  relativistic  values  of  the 
molar  heat  capacity  (specific  heat)  respectively.  The  relationship  between 
the  extensive,  intensive,  and  molar  quantities  that  are  used  in  this  paper 
is  as  follows 


'  <  ■  (ir)v  <19> 

cv  ■  "ev  *  (f)v  <20) 

Ea  -  nOa  -  Ua/v  (21) 

l  -  nU  -  U/V  (22) 


where  Ua  and  U  *  nonrelativistic  and  relativistic  internal  energy  per  mole. 

The  relationship  between  the  relativistic  and  the  nonrelativistic  func¬ 
tions  that  appear  in  equations  (10)  through  (17)  are  given  by4 
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and  where  TR  ■  relativity  temperature  constant*  and  Tqr  -  conjugate  relativity 
temperature  constant.  The  relationship  between  TR  and  TCR  is  shown  in  Figure 
1.  Thus  the  relativistic  effects  enter  the  real  gas  state  equation  only 
through  the  third  and  higher  virial  coefficients;  the  ideal  gas  term  and  the 
second  virial  coefficient  are  unaffected. 

The  relativity  temperature  TR  and  the  conjugate  relativity  temperature 
Tcr  are  related  to  the  critical  temperature  of  a  real  gas.  The  conditions 
for  the  critical  point  can  be  expressed  in  terms  of  the  second  and  third 
virial  coefficients  as  follows15 


B<Icrlt>  -  -  *(Tcrit>  <27> 

3C<Icrlc>  -  *Z(Tcrit>  <28> 

or  equivalently 


3C<Tcrlt> 


b2(t  ,j 

crit 


(29) 


where  Tcrit  ■  critical  temperature.  Equations  (24),  (25),  and  (29)  give  the 
critical  point  condition  as16 


Ca  -  |  [BS] 2 (1  +9  in  \pa) 


(30) 
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and  gives  the  relationship  between  Tcrit  and  TR  (or  Tqr)  that  is  shown  in 
Figure  2.  Figure  3  gives  the  dependence  of  the  critical  molar  volume  on 
the  relativity  temperature. 

The  Grllneisen  function  can  be  evaluated  for  real  gases  using  equation 
(14)  which  gives 


(||)  -  nR[l  +  nfx(T)  +  n2f2(T)  +  •••  ] 
n 

(31) 

where 

£1<T>  -lf+B 

(32) 

t2m  - 1  f£  +  c 

(33) 

and  equation  (17)  which  gives 
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Then  equations  (3),  (31),  and  (34)  give  the  relativistic  Grtlneisen  parameter 
as 
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] 


(37) 


Y  “  j  [1  +  ny | (T)  +  n2y2(T)  +  ••• 


where 


Y  L (T)  =  fjCT)  +  gl(T) 


(38) 


Y2(T)  -  f 2 (T)  +  g2(T)  +  f L (T)gL (T) 


(39) 


Expressions  analogous  to  equations  (31)  through  (39)  hold  for  the  nonrel- 
ativistic  Grtlneisen  parameter. 


3.  EXCITATIONS  IN  THERMODYNAMIC  SYSTEMS.  This  section  considers  mech¬ 
anical  radiation  in  thermal  media.  Only  small  amplitude  vibrations  are  con¬ 
sidered.  When  radiation  is  present  in  a  thermal  system  the  relativistic 
energy  density,  pressure,  bulk  modulus,  and  heat  capacity  are  written  as, 

E  +  Er  ,  P  +  Pr  ,  K>p  +  Kfj-  ,  and  Cy  +  Cyr  respectively,  while  the  correspond¬ 
ing  nonrelativistic  quantities  become  Ea  +  Ea  ,  Pa  +  Pa  ,  +  Kar  ,  and 

Ca  +  Car  respectively,  where 

E  and  E  -  nonrelativistic  and  relativistic  radiation  energy  density 
respectively 


Pa  and  Pf  =  nonrelativistic  and  relativistic  radiation  pressure  respectively 


‘S'r 


relativistic  bulk  modulus  of  the  radiation 


nonrelativistic  bulk  modulus  of  the  radiation 


/3E\ 

V^— |  ■  relativistic  heat  capacity  of  radiation 


/3Er\ 

\3T~ /  *  nonre^at^v^st^-c  heat  capacity  of  radiation 


The  radiation  terms  are  assumed  to  be  much  smaller  than  the  ground  state 

terms,  i.e.,  E  <<  E  and  P.  <<  P  . 
r  r 
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The  Grtlneisen  parameter  y  and  the  gauge  parameter  b  become  y  +  6r  and 
b  +  8r  ,  where  6r  and  0r  -  incremental  changes  in  the  parameters  y  and  b 
when  radiation  is  present  in  the  system.  The  increment  in  the  Grtlneisen 
parameter  of  the  system  due  to  the  presence  of  small  amplitude  radiation  is 
obtained  from  the  defining  equation  (3)  by  noting  that 


Y  + 


C„  +  C 


Vr 


£<p 


+  P  ) 
r 


(AO) 


Expanding  the  denominator  in  equation  (40).  keeping  only  first  order  terms, 
and  finally  substracting  equation  (3)  gives 


V  3Er 

6r  “  TT  (Yr  "  Y) 


(41) 


+  (rr  -  Y) 


(42) 


where  yr  -  relativistic  Grtlneisen  parameter  of  the  radiation  Itself,  and  is 
defined  as 


Y 


r 


„  3P  3P  #3E 

_V _ r  _ _ r  / _ r 

Cyr  3T  3T  /  3T 


(43) 


and  where  Tr  -  relativistic  diffuse  radiation  factor  which  is  defined  by 


r 


r 


(44) 


Note  that  a  comparison  of  equations  (41)  and  (42)  shows  that  if  Tr  is  inde¬ 
pendent  of  temperature,  then  Tr  -  yr  ,  ’ 

Similarly,  the  increment  in  the  gauge  parameter  b  due  to  the  presence 
of  radiation  in  the  medium  is  obtained  from  equation  (4)  by  observing  that 
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(45) 


Expanding  the  denominator  in  equation  (45),  keeping  only  first  order  terms, 
and  subtracting  equation  (4)  gives 


9P 

T  — - 
9T 

Tf(pr-Kxr) 

p-Rt 

(P  -  Kt)2 

9Er 
YrT  3T~ 

yT  3T  ( Pr  “  ^r) 

p-Kt  (P  -14.)2 


”  ™  ( rrEr  )  ■  |p  _  J^j2  [rrEr  +  V  3V  (  FrEr)] 

Note  that  equation  (46)  can  be  rewritten  as 

B  ^  ■  T  I  (fr  -  V) 

r  ‘  p  -  «t  c  -  «i)2 


(46) 


(47) 


where  br  ■  radiation  gauge  parameter  given  by 


b 


r 


(48) 


The  parameters  Yr  and  br  are  the  two  gauge  parameters  for  radiation  in  a 
thermal  medium. 
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The  corresponding  nonrelativistic  values  of  the  <5r  and  Br  are  given  by 


6  - 


V  _ r  ,  a 

=  3T 


a\ 
Y  ) 


3P 


3P 

3T 


P*  -  3T 


(p*  -  k ;)■ 


[rr“Er+Vw(rrEr)] 


(49) 


(50) 


a  fl 

where  yT  is  given  by  the  nonrelativistic  analog  of  equation  (43),  and  rr  is 
given  by  the  nonrelativistic  analog  of  equation  (44).  Note  also  that  6r  and 
0r  are  small  quantities  because  Er  is  assumed  to  be  small  compared  to  E  . 

But  the  radiation  gauge  parameters  yr  and  br  ,  and  the  diffuse  radiation 
factor  Tr  ,  are  not  small  quantities  since  they  are  defined  as  the  ratio  of 
two  small  numbers. 

When  radiation  is  present  in  a  general  thermodynamic  system,  equation 
(9)  can  be  written  as 

[l  '  (  b  +  8r)  +  T  W  -  (b  +  6r)V  W]  (E  +  Er)  (51) 

*  3[l  ♦  V  +  «t  +  V  ±  -  (y  +  5r)t  £](p  ♦  Pt) 

■[i-(»a  +  ®;)+Tw  -(b‘+8J)v^](Ea  +  E;) 


Subtracting  equation  (9)  from  equation  (51)  and  keeping  only  first  order 
terms  yields  the  following  first  order  radiation  equation 

(l-b+I^-bV^)Er-8r(T|£-p)  (52) 

I 

-  3[(l  +  ^  +  VW-^^)Pr  -  Sr(TW-?)] 

■  t1  -  b“ + 1  w  - bav  w)  K  -  8J(T  ir  -  pa) 
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where  the  following  standard  thermodynamic  relationship  was  used 


3U 

3V 


E  +  V 


3E 

3V 


(53) 


Equation  (52)  can  also  be  written  as 


■sKI-1'1 


3d 

where  Ur  ■  VEr  -  relativistic  radiation  internal  energy,  and  Ur  ■  VEr  ■  no-V’ 
relativistic  radiation  internal  energy.  Equation  (52)  or  equation  (54)  can 
serve  as  the  basic  first  order  relativistic  thermodynamic  equation  governing 
radiation  in  a  thermal  medium.  Equations  (52)  and  (54)  are  completely  general 
and  can  be  used  to  derive  the  radiation  equations  for  real  gases.  To  do  this 
it  is  first  necessary  to  develop  a  nonrelativistic  theory  of  mechanical  radi¬ 
ation  in  real  gases  so  that  the  terms  on  the  right  hand  side  of  equations  (52) 
or  (54)  can  be  evaluated. 

4.  NONRELATIVISTIC  THEORY  OF  WAVES  IN  REAL  GASES.  A  simple  nonlinear 
nonrelativistic  calculation  of  the  radiation  energy  density  and  pressure  for 
waves  in  real  gases  is  presented  that  will  allow  the  calculation  of  the  non¬ 
relativistic  amplitude  of  the  waves.  The  nonrelativistic  radiation  pressure 
and  energy  density  are  written  in  a  virial  form  analogous  to  the  ground 
state  equations  (10)  and  (11)  as  follows 


Pa  -  Ta  nRaT  +  n2RaTB3(T)  +  n3RaTCa(T)  +  ••• 
r  ro  r  rv  '  r'  * 


(55) 


Ea 


nRaT  -  n2RaT2 
r 


1 

2 


3„a_2 
n  R  T 


•  •  • 


(56) 


where  raQ  ■  1/3 
nonrelativistic 
termined  from  a 


=  diffuse  radiation  factor  for  an  ideal  gas,  and  where  the 
radiation  coefficients  R®  ,  B^(T)  ,  and  cf(T)  are  to  be  de¬ 
simple  model  that  describes  the  vibrations  in  a  real  gas. 
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The  form  of  Che  energy  density  In  equation  (56)  follows  from  equation  (55) 
by  the  requirement  that 


The  functions  6®  ,  (3®  ,  ya  ,  and  ra  that  appear  In  the  right  hand  side  of  the 
wave  equation  (52)  can  be  calculated  In  terms  of  the  nonrelatlvlstlc  radiation 
virlal  coefficients  from  equations  (55)  and  (56)  by  using  equations  (49)  and 
(50)  and  the  nonrelatlvlstlc  analogs  of  equations  (43)  and  (44). 

The  nonrelatlvlstlc  vibrations  in  a  mechanical  medium  have  an  energy 
density  given  by17 


E® 

r 


I  k2A2K! 
4  a  ax 


(58) 


where  ka  -  nonrelatlvlstlc  wave  number •  and  Aa  -  nonrelatlvlstlc  wave  ampll 
tude.  The  wave  number  and  wave  amplitude  that  appear  In  equation  (58)  are 
also  expected  to  have  a  vxrial  expansion  of  the  form 


k2A2  -  k2A2 [l  +  naf(T)  +  n2a*(T)  +  •••]  (59) 

a  a  o  o|  l  l  l 

where  and  02  are  unknown  functions  of  temperature  th&c  are  to  be  deter¬ 
mined  »  and  where  kQ  and  AQ  ■  known  wave  number  and  wave  amplitude  respective¬ 
ly  associated  with  waves  In  an  ideal  gas.  Combining  equat'ons  (12),  (58)  and 
(59)  gives 

E®  -  ~  k2 A2nRaT 1 1  +  n(2B®  +  ct® )  +  n2(3Ca  +  2c^B®  +  a®  )  +  •••]  (60) 

Comparing  equations  (56)  and  (60)  gives 


R®  ■  t-  k2A2Ra 
r  4  o  o 


(61) 


-T 


1  2  2 
T  Kkt 

4  00 


(62) 
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(63) 
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7  k2A2(3C 
2  o  o' 
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+  2aX  + 


•;) 


Equation  (61)  immediately  determines  the  value  of  the  radiation  coefficient 
Rf  ,  but  further  equations  in  addition  to  equations  (62)  and  (63)  are  needed 
to  determine  the  radiation  virial  coefficients  B|  and  C®  .  This  is  so  be¬ 
cause  the  functions  and  a|  are  also  unknown  and  need  to  be  determined. 

The  additional  equations  needed  in  conjunction  with  equations  (62)  and 
(63)  are  those  involving  the  non:elativistic  diffuse  radiation  factor  r|  de¬ 
fined  by 


P 


a 

r 


raEa 

r  r 


(64) 


Combining 


equations  (55) ,  (56)  and  (64)  gives  the  following  expression  for  ra 


ra  r  ^  i  *.  a  x-,a 

T  -  r  +  ni  .  +  n  i  ,  +  • 
r  ro  rl  r2 


where  a r  before  F  -  1/3  and 
ro 


(65) 
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+  ra  t 

ro 


ra  t 

ro 


rrlT 


(66) 


(67) 


3d 

where  R  /Rr  is  given  by  equation  (61).  But  it  is  well  known  that  the  gen¬ 
eral  expression  for  the  diffuse  radiation  factor  is17 


a  1  n  dW3 

r  3  y*  dn 


(68) 


3 

where  W  -  phase  velocity  of  mechanical  waves  in  a  thermodynamic  medium.  The 
phase  velocity  of  waves  in  a  general  thermodynamic  medium  is  given  by4 
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(69) 


+  vaT  II 

*T  _  Y  1  3T 
E3  +  Ya0® 
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where 


E 


a 


Ea  +  Pa  +  Kj 


(70) 


0 


a 


(71) 


Thus  In  general  U8  ■  W^n.T)  and  substitution  of  equation  (69)  into  equation 
(68)  and  expanding  in  terns  of  the  ground  state  virial  coefficients  automati¬ 
cally  4etermine*  the  expansion  coefficients  for  the  diffuse  radiation  factor 
given  in  equation  (65) .  Therefore  it  will  be  assumed  that  r?o  •  r?i  •  r;2  • 
and  so  on,  can  be  obtained  from  the  sound  speed  and  are  known  functions  of 
temperature  through  the  ground  state  virial  expansion  coefficients.  Then  eq¬ 
uations  (66)  and  (67)  can  be  integrated  to  obtain  the  nonrelativlstic  radia¬ 
tion  virial  coefficients  Ba(T)  and  Cj(T)  .  Finally  equations  (62)  and  (63) 

can  be  used  to  calculate  aa  and  aa  as  follows 


a?(T) 


2!r 

3T 


-  2B 


c*2<t> 


3C 

_ r 

3T 


3Ca  -  2oaB* 


(72) 


(73) 


Then  kaAa  can  Ve  determined  from  equation  (59).  It  will  be  assumed  that 
ka  ■  kj,  and  therefore  equation  (59)  gives  the  nonrelativlstic  wave  amplitude. 

5.  SOLUTION  OF  THE  WAVE  EQUATION  FOR  REAL  GASES.  The  solution  of  the 
radiation  equation  (52)  for  the  real  gases  can  be  most  easily  obtained  from 
equations  (14),  (15)  and  (23)  through  (26)  that  describe  the  relativistic 
ground  state  solution  of  equation  (i)  for  the  real  gases.  The  relativistic 
expressions  for  the  radiation  pressure  and  energy  density  are  written  in  a 
form  analogous  to  equations  (55)  and  (56)  as  follows 
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(74) 


P  -  T  nR  T  +  n2RTB  (T)  +  n3RTC  (T)  +  ••• 
r  ror  i  r 

2  2  13  2 

Er  -  nRrT  -  n V  ^  ±  nJRT2  ^  -  ...  (75) 


where  the  relativistic  radiation  parameters  rro  and  Rr  ,  and  the  relativistic 
radiation  virial  coefficients  Br  and  Cr,  are  to  be  determined  from  the  solution 
of  the  radiation  equation  (52) .  The  functions  6r  ,  Br  ,  yT  ,  and  Fr  that 
appear  in  equation  (52)  can  be  calculated  from  equations  (74)  and  (75)  by 
using  equations  (42),  (43),  (44),  and  (46). 

The  solution  of  the  radiation  equation  (52)  for  the  real  gases  can  be 
immediately  obtained  from  the  ground  state  solution  of  equation  (1),  as  given 
by  equations  (23)  through  (26)  for  the  real  gases,  by  a  simple  perturbation 
method  applied  to  this  solution.  Thus  when  mechanical  radiation  is  present 
in  a  real  gas,  equations  (23)  through  (26)  become 


R  +  Rf  -  Ra  +  Ra 


(76) 


B(T)  +  Bf(T)  -  B®(T)  +  Ba(T) 


(77) 


C(T)  +  Cr(T)  -  Ca(T)  +  Ca(T)  -  3[b3(T)  +  Ba(T)]2  In  (  i>a  +  ^a )  (78) 

Subtracting  equations  (23)  through  (25)  from  equations  (76)  through  (78) 
respectively  and  keeping  only  first  order  terms  yields 


R  -  Ra 
r  r 


(79) 


B  *  B3 
r  r 


(80) 


C  -  ca 

r  r 


I  “ 

-  3[2BaBa  +  (B®)2]  In  +a  -  3(Ba  +  Ba)2  In  (l  +  —  ) 

r  r  r  \  ,i,a  ' 


Cr  a:  C3  -  6BV  tn  <l)a  -  3(Ba)2ipaMa 


(81) 

(81a) 


where  the  following  first  orde..  approximation  has  been  used 
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(82) 


In  (  <pa  +  * J )  -  in  ipa  +  In  ( 1  +  ~  J  %  In  i|/a  +  -§ 

to  obtain  the  result  in  equation  (81a).  The  small  radiation  term  ij>a  that 
occurs  in  equation  (81a) is  obtained  from  the  defining  equation  (26)  as  follows 


. a  ,  .a 

4,  +  i|»r 


BS(T)  +  Ba(T) 
B3(TR)  +  Bj(TR) 


2/3 


(83) 


Expanding  the  denominator  in  equation  (83) ,  and  subtracting  equation  (26) , 
and  finally  dividing  by  ipa  yields  the  following  first  order  approximation 


£  2[Bfo>  »>*)] 

il>a  3  l  Ba(T)  Ba(TR)  J 

BfoCR>1 

3  |_Ba(T)  Ba(TCR)J 


(84) 


Equations  (79)  through  (81)  give  the  relativistic  radiation  virial  coeffi¬ 
cients  in  terms  of  the  corresponding  nonrelatlvlstlc  radiation  virial  co¬ 
efficients  and  in  terms  of  the  second  order  ground  state  virial  coefficient 
B*(T)  .  Note  that  at  the  Boyle  temperature  Tg  ,  at  which  Ba(TR)  ■  0  , 

it  follows  from  equation  (81a)  that  Cr(Tg)  ■  Ca(Tg)  .  Also  note  that  at  the 
relativity  temperature  TR  (or  at  the  conjugate  relativity  temperature  TCR  ) 
it  follows  from  equation  (26)  that  <|/a  ■  1  ,  and  from  equation  (84)  that 
1J1®  ■  0  ,  so  that  Cr(TR)  -  C5(TR)  and  Cr(T(jR)  ■  C*(TqR)  .  Therefore  any  ex¬ 
perimental  test  that  is  conducted  to  determine  the  difference  between  Cr(T) 
and  C?(T)  should  exclude  the  temperature  regions  around  Tg  ,  TR  and  TCR  . 

A  similar  result  is  already  known  for  the  ground  state  third  virial  coeffi¬ 
cient.1* 


6.  RELATIVISTIC  WAVE  AMPLITUDE  AND  PHASE  VELOCITY.  Relativistic 
effects  on  waves  in  real  gases  will  manifest  themselves  in  the  amplitude  and 
dispersive  properties  of  the  waves.  Therefore  it  is  important  to  be  able  to 
calculate  the  relativistic  amplitude  and  phase  velocity  of  waves  in  real 
gases  and  to  compare  them  with  their  corresponding  nonrelativistic  values. 

The  relativistic  energy  density  for  mechanical  waves  in  a  real  gas  is  written 
in  analogy  to  equation  (58)  as*7 
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(85) 
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where  k  -  relativistic  wave  number,  A  -  relativistic  amplitude,  and  where 
Rf  is  given  by  equation  (16).  In  a  form  similar  to  that  of  equation  (59), 
the  product  is  written  as 


k2A2  -  k2A2  [l  +  na1  (T)  +  n2a2(T)  +  •••]  (86) 


where  the  relativistic  functions  a^(T)  and  012 (T)  need  to  be  determined.  Com¬ 
bining  equations  (16),  (75),  (85),  and  (86)  gives 


12  2  a 

R  -  t-  k  A  R  -  R  (87) 

r  4  o  o  r  v  ' 
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|  kV(3C  +  2aiB  +  a2) 


(88) 


(89) 


Because  B(T)  ■  Ba(T)  and  Br(T)  ■  Ba(T)  it  follows  from  equations  (62)  and 
(88)  that  ot^  -  .  The  value  of  is  obtained  from  equation  (89)  to  be 
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(90) 


where  Cr  is  given  by  equation  (81)  and  C  is  given  by  equation  (25).  In  this 
way  the  relativistic  expression  for  k^A^  given  by  equation  (86)  can  be  cal¬ 
culated  in  terms  of  nonrelativistic  quantities.  Since  oj  ■  a®  ,  it  is  clear 
that  relativistic  effects  affect  only  the  second  order  and  higher  terms  in 
equation  (86).  Essentially  this  is  due  to  the  fact  that  only  the  third  and 
higher  vlrial  coefficients  of  the  ground  state  are  affected  by  relativity 
as  shown  in  equations  (24)  and  (25). 


The  relativistic  phase  velocity  can  be  obtained  by  first  noting  that 
the  relativistic  diffuse  radiation  factor  is  obtained  from  equations  (44), 
(74),  and  (75)  to  be 
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R  r  1  3Cr  3Brl 

fr2  ’  [Cr  <4  rroT  ST*  +  rrlT  irj  <94> 

where  Cr  is  given  by  equation  (81).  Therefore  rr(n,T)  can  be  evaluated  in 
terms  of  nonrelativistic  quantities.  The  relativistic  sound  speed  can  then 
be  calculated  by  solving  the  following  equation 


rr(n.T) 


1  n  dW 
3  W  dn 


(95) 


or 


(96) 


where  c  -  light  speed. 

7.  GRAVITATIONAL  WAVES  IN  REAL  GASES.  It  has  been  suggested  that 
real  gases  can  possibly  b«  used  in  a  gravity  wave  detector.12  This  is 
possible  because  the  relativity  temperature  parameter  Tr  that  occurs  in  the 
state  equation  of  relativistic  real  gases  is  a  measure  of  the  Interaction  of 
the  real  gases  with  the  vacuum  state,  and  gravity  waves  are  oscillations  of 
the  vacuum,  i.a.,  waves  of  curvature  in  spacetime.  Gravity  waves  are  shear- 
lllce  in  nature  and  are  not  expected  to  directly  change  the  volume,  pressure, 
or  temperature  of  a  gas,  liquid  or  solid.  Thus  in  the  case  of  the  Weber  bar 
design  for  a  gravity  wave  detector,  only  the  surface  shear  strain  is 
attempted  to  be  measured,  but  no  success  has  been  reported. 8-11 

The  interactions  of  real  gases  are  of  dipole-dipole,dipole-quadrupole, 
and  quadrupole-quadrupole  types.13  These  interactions  depend  on  the  sep¬ 
aration  and  shape  of  the  molecules  through  their  dipole,  quadrupole,  and 
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higher  moments.13  The  values  of  Tr  and  Tcrit  depend  on  these  multipole 
moments  as  both  temperatures  are  species  dependent.4  The  tidal  nature  of 
gravity  waves  will  alter  the  raultipole  moments  across  a  volume  of  gas,  and 
will  produce  a  gradient  of  T^  across  the  volume  of  gas  in  a  detector.  Grav¬ 
ity  wave  detector  calculations  must  be  done  in  conjunction  with  the  relativ¬ 
istic  state  equations  of  the  materials  used  in  a  detector.  Real  gases  and 
liquids  exhibit  a  critical  point,  and  the  critical  temperature  is  "elated 
to  the  relativity  temperature  by  equation  (30).  Solids,  on  the  other  hand, 
do  not  have  a  parameter  akin  to  T^  in  their  relativistic  state  equations.12 
Gases  and  liquids  are  expected  to  be  sensitive  to  gravity  waves  while  solids 
are  not  expected  to  show  any  response. 

The  values  of  the  relativity  temperature  T^  and  the  critical  temperature 
are  expected  to  vary  across  cne  volume  of  a  gaseous  gravitational  wave 
oitector  due  to  the  tidal  effects  of  gravity  waves.  Heat  exchange  in  the 
detector  gas  will  tend  to  produce  a  uniform  change  in  temperature.  The  tidal 
effects  of  gravitation  can  be  described  by  the  difference  between  the  metric 
gpV  for  gravitational  waves  and  the  Minkowski  metric  g°  ■  (1,  1,  1,  -  1) 
wnich  is  written  as18 


(97) 


where  the  values  of  the  small  dimensionless  number  h  give  a  measure  of  the 
strength  of  gravitational  radiation  at  the  detector. 

The  gravitational  potential  that  is  associated  with  this  weak  gravita¬ 
tional  field  is  x  *  hc^  .  In  the  presence  of  a  gravitational  field  the 
energy  of  a  body  is  altered  by  the  following  quasi-static  factor19 


so  that  the  effects  of  a  gravitational  wave  on  the  relativity  temperature  is 
to  give  it  the  value 


-  (1  +  2h)1/2TR 
%  (1  +  h)TR 


(99) 
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where  Tgg  ■  value  of  relativity  temperature  in  the  presence  of  gravity  waves. 
The  change  in  the  value  of  T_  due  to  ambient  gravity  waves  is  therefore12 


(100) 


A  similar  analysis  holds  for  the  conjugate  relativity  temperature  Tqr  . 

The  order  of  magnitude  change  in  the  value  of  the  relativity  temperature 
depends  on  the  value  of  h  at  the  detector. 

Many  studies  have  been  done  on  the  relative  strengths  of  possible 

astronomical  sources  of  gravity  waves.9-11  These  sources  Include  pulsars 

-27  -24  -22  -19  -21 

10  <  h  <  10  ,  supernovae  10  <  h  <  10  ,  and  binary  stars  h  <  10 

It  is  possible  that  the  galactic  center  radiates  gravity  waves  with  h  <  10 

Taking  Tg  ^  100*K  gives  10-^  <  5Tg  <  10-^  as  a  likely  range  for  the  change 
in  the  relativity  temperature  of  a  gas  due  to  astronomical  sources  of  gravity 
waves.  The  corresponding  changes  in  pressure ,  temperature*  and  volume  in  a 
gaseous  gravitational  wave  detector  will  now  be  calculated. 

8.  GENERALIZED  FORCE  ASSOCIATED  WITH  RELATIVITY  TEMPERATURE.  The  gen¬ 
eralized  work  done  during  a  change  of  volume  and  a  change  of  the  relativity 
temperature  of  the  system  is  given  by 


dW  -  PdV  +  SRdTR 


(101) 


-  -  ^  dn  +  SRdTR 


where  Sg  ■  generalized  force  associated  with  Tg  .  Clearly  Sg  has  the  dimen¬ 
sions  of  an  entropy.  The  generalized  force  associated  with  dV  is  clearly  the 
system  pressure  P.  The  generalized  forces  can  be  calculated  using  the  Gibbs- 
Helmholtz  equation  which  states  that  if  a  generalized  work  is  written  as  Edq  , 
where  E  ■  generalized  force  associated  with  a  physical  variable  q  >  then20 


(102) 


For  Instance  E  might  be  an  electric  field  and  q  an  electric  charge.  The 
Glbbs-Helmholtz  equations  associated  with  the  situation  in  equation  (101)  are 
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(103) 


(104) 


Equation  (104)  can  be  used  to  determine  the  function  . 


An  expression  for  can  easily  be  obtained  from  equation. (104)  by  making 
the  substitution  =  Ts^  because  then  equation  (104)  becomes 


(105) 


Combining  equation  (15)  and  (18)  with  equation  (105)  gives 


1  NPT2n2 

2  NRT  3T3T 


(106) 


which  reduces  immediately  to 


s 


R 


(107) 


Finally  SR  ■  TsR  gives 


(108) 


which  can  be  written  per  unit  volume  as 


(108A) 
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or  in  molar  quantities  as 


§ 


R 


(108B) 


where  C  -  relativistic  third  virial  coefficient.  This  generalized  force  (en¬ 
tropy)  will  be  used  subsequently  to  calculate  the  changes  in  volume,  tempera¬ 
ture,  and  pressure  in  a  gas  due  to  gravity  waves. 

The  derivative  in  equation  (108)  can  be  evaluated  by  using  equations  (25) 
and  (26)  which  give 


TrI? - SU'd)]2 


*a  3tr 


(109) 


and 


*•  5ir  [  3  »* 


Tr  3B‘(Tr) 


(Tr) 


] 


(110) 


For  reference  it  is  noted  also  that 


T  .  2  T  3B*(T) 

“7  ■*$“  *1+7  — -  w  >  1 

a  3T  3  a(  }  3T 


(111) 


Combining  equations  (108B) ,  (109),  and  (110)  gives  the  final  result  as 


yn,T.TR)  -  -  \  R  ^  nV(I>]2  [x  ♦  f  ^  (112) 

» 

The  eptropy  §R  is  a  purely  relativistic  quantity  that  is  associated  with  the 
variation  of  Tg  and  Is  related  to  the  interaction  of  the  vacuum  state  with 
the  molecules  of  a  real  gas.  Equations  (23)  through  (26)  and  equation  (112) 
represent  a  relativistic  thermodynamic  analog  of  the  Casimir  effect  of  quan¬ 
tum  electrodynamics.21 
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9.  ADIABATIC  CHANGES  OF  TEMPERATURE,  VOLUME,  AND  PRESSURE.  The  first 
law  of  thermodynamics  for  the  relativistic  real  gas  can  be  written  as  follows 


dU  -  dQ  -  PdV  -  SRdTR 


(113) 


dQ  +  — —  dn  - 
n 


SRdTR 


where  dQ  ■  TdS  ■  increment  of  heat  associated  with  the  absorption  of  gravity 
waves  by  a  real  gas,  and  dS  ■  corresponding  increase  in  entropy.  Because  the 
internal  energy  is  a  state  function  it  has  a  perfect  differential  which  can 
be  written  as 


(114) 


Using  the  Gibbs-Helmholtz  equations  (3  03)  and  (104)  bring  i  equation  (114)  into 
the  following  form 


dU 


(-) 

3T  V.T. 


dT  + 


the  heat  increment 


■«■(») ,  «~(#L 


1 

r  /  3Sr1 

i  -) 

P  dV  + 

[T(  3T”y 

I  -  S_  dT_  (115) 

v.TR  rJ  r 

(113)  gives  the 

following  expression  for 

/ 

1  dTR 

V’TR 

(116) 

The  condition  for  adiabatic  processes  is  given  by  dQ  -  0  or 


cvdI  +  T(w)v . llv  +  T(ir)v .  "r  * 0  '  (ll7) 

v,TR  v,tr 

where  Cv  is  given  by  equations  (17)  and  (20).  It  will  be  assumed  that  grav' 
ity  wave  interactions  with  the  real  gases  are  sufficiently  rapid  that  they 
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can  be  described  as  adiabatic  processes.  The  general  expression  for  the 
change  of  pressure  in  a  gas  due  to  the  passage  of  a  gravity  wave  will  be 
written  as 


Combining  equations  (14) ,  (16) ,  and  (118)  gives 
dP  -  nR[l  +  nf x (T)  +  n2f2(T)  +  •••  ]  dT 


(118) 


(119) 


+  XTl  +  RTn3|iTR 

where  f}  and  f£  are  given  by  equations  (32)  and  (33)  respectively.  Several 
special  cases  will  now  be  examined. 

Using  equation  (117)  allows  several  interesting  adiabatic  situations  to 
be  considered. 

Case  a.  Adiabatic  Change  of  Temperature  at  Constant  Volume. 

For  this  case  equation  (117)  gives 


Combining  equations  (108)  and  (120)  gives 


dT 


S.V 


Bn2CJT 

2Vr 


where  the  dimensionless  quantity  J  is  given  by 


(120) 


(121) 


T  TT  2 

AR  3C  1aR  3*C 

C  5Tr  C  3T3Tr 


(122) 
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The  second  derivative  that  occurs  in  equation  (122)  is  calculated  using  equa¬ 
tion  (109)  as  follows 


TT. 


32C 


R  3T3T„ 


ip  R 


(123) 


The  result  in  equation  (121)  can  be  rewritten  using  equation  (34)  as  follows 


dT 


S.V 


CJn2T  , ,  .  2  . 

—  (1  +  gln  +  g2n  + 


(124) 


where  and  g£  are  given  by  equations  (35)  and  (36)  respectively.  The  sign 
of  the  temperature  change  given  by  equation  (124)  depends  or.  the  sign  of  the 
product  CJ  which  is  temperature  dependent  and  can  be  positive  or  negative  ac¬ 
cording  to  the  value  of  temperature  being  considered. 

Case  b.  Adiabatic  Change  of  Volume  at  Constant  Temperature. 

By  using  the  definition  of  the  GrUneisen  function  given  in  equation  (3)  it 
follows  from  equations  (117)  and  (120)  that 


dV 


S,T 


(125) 


S,T 


Combining  equations  (34),  (37),  (121),  and  (125)  gives 


dV 


S,T 


dn 

2 


S,T 


dT 

2YCVTR  R 


-  [1  -  fl*  +  (ft  -f2)n‘  -•••]«„ 


(126) 


(127) 
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where  f^  and  are  given  by  equations  (32)  and  (33)  respectively. 
Case  c.  Adiabatic  Change  in  Pressure  at  Constant  Volume. 

Placing  equation  (124)  into  equation  (119)  with  dn  -  0  gives 


dP 


S.V 


RTCn~ 

3T„ 


(F 


+  F^n 


+  F2n 


)  dT_ 


(128) 


where 


F  - 


F,  - 


Fo  - 


j  +12  3£_ 

C  3TR 

(129) 

V 

(130) 

V 

(131) 

and  where  y^  and  y2  are  defined  in  equation  (38)  and  (39)  respectively.  An 
equivalent  expression  for  dP  can  also  be  written  in  terms  of  the  Grilneisen 
parameter  as  follows 


dP 


s.v 


RTCn3  f  yj 
TR  l2 


TR  3C 
C  3TrJ 


(132) 


Substituting  the  power  series  expansion  for  y  given  by  equation  (37)  into 
equation  (132)  yields  the  result  given  in  equation  (128).  Thus  dP  'v  n3  for 
low  densities. 

Case  d.  Adiabatic  Change  in  Pressure  at  Constant  Temperature. 

Combining  equation  (126)  with  equation  (119)  for  dT  -  0  yields 


dP 


S,T 


RTCn3  fTR  3C 
Tr  U  3Tr 


(133) 


Using  equations  U6)»  (34)*  and  (33)  allows  equation  (133)  to  be  rewritten 

as 
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dpi 


S  ,T 


RTCn' 

2T„ 


(Gq  +  GLn  +  G2n2  +  •••)  dTp 


(134) 


where 


CM 

i£_  .  j 

(136) 

J(fj 

-  2B) 

(136) 

J(f2 

-  f2  +  2f jB  -  3C) 

(137) 

where  J  is  given  by  equation  (122)  and  and  f2  are  given  by  equations  (32) 
and  (33)  respectively.  Therefore  at  low  densities  dP  ^  n^  .  Because  in  gen¬ 
eral  dP/P  ^  n^  for  low  densities,  the  efficiency  of  a  gaseous  gravitational 
wave  detector  can  be  improved  by  increasing  the  density  of  the  gas  in  the 
detector. 

Consider  now  the  case  of  a  constant  pressure  system.  From  equation  (118) 
it  follows  that  the  constant  pressure  condition  is  written  as 


/dP \ 

T,T 


dV  + 


ft)  ■ 

'  r/v,t 


diR-0 


(138) 


Two  cases  of  the  constant  pressure  system  are  of  interest. 

Case  e.  Change  in  volume  at  Constant  Pressure  and  Temperature. 

From  equation  (138)  and  equations  (6),  (14),  and  (23)  through  (26)  it  follows 
that 


dV 


P,T 


RTn2  5C 

^  aiR 


(139) 
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where  Sr  is  given  by  equation  (108B).  Equation  (139)  can  be  rewritten  using 
equation  (16)  ?s  follows 


dv| 


P.T 


n[l  -  2nB  +  n2(4B2  -  3C)  -  *••]  |£-  dTR 


(140) 


Case  f.  Change  in  Temperature  at  Constant  Pressure  and  Volume. 

From  equation  (138)  and  equations  (3),  (14),  and  (23)  through  (26)  it  follows 
that 


«ip.v  ■  -  x  (X)TVdT: 


(141) 


RTn  3C 

yCw  9TR 


dT„ 


Using  equations  (34)  and  (37)  allows  equation  (141)  to  be  rewritten  as 


"Ip.V  -  -  T„2[l  -  £(n  +  <f2  -  f2)n2  -  •••]  |§-  dIR 

K 


(142) 


where  fj  and  f2  are  given  by  equations  (32)  and  (33)  respectively. 

10,  CONCLUSION.  The  description  of  relativistic  wave  motion  in  real 
ga«Aft  oust  Include  the  coupling  of  the  matter  and  radiation  fields  with  the 
thermodynamic  gauge  parameters  for  matter  and  radiation.  This  means  that  the 
quantities  P,  y»  b  and  Pr  ,  Yr  »  br  are  coupled  as  shown  in  equation  (52). 
This  is  true  for  wave  motion  in  any  relativistic  physical  system.  The  form 
of  the  relativistic  third  vlrlal  coefficient  of  the  ground  state  of  a  real 
gas  is  affected  by  the  ground  state  gauge  parameters.  When  mechanical  radi¬ 
ation  is  present  in  real  gases,  the  third  vlrlal  coefficient  of  the  radiation 
itself  is  correspondingly  affected  by  both  the  ground  state  and  radiation 
gauge  parameters.  Because  only  the  third  and  higher  virial  coefficients  are 
affected  by  the  gauge  parameters,  measurable  relativistic  effects  should  be 
observed  only  at  high  pressures  such  as  can  occur  in  nuclear  explosions  in 
the  atmosphere,  during  the  interaction  of  directed  energy  beams  with  the  at¬ 
mosphere,  or  in  high  pressure  laboratory  experiments.  The  tidal  effects  of 
gravitational  radiation  are  expected  to  appear  in  the  third  and  higher  virial 
coefficients  of  the  real  gases,  and  therefore  these  gases  under  high  pressure 
can  serve  as  suitable  materials  for  a  gravitational  wave  detector. 
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Figure  1.  Relationship  between  the 
relativity  temperature  and  the  con¬ 
jugate  relativity  temperature. 


Figure  2.  Dependence  of  reduced 
critical  temperature  on  the  reduced 
relativity  temperature.  Note  that 

Tcrlt  *  TB  and  B*<Tcrit>  <  0  • 
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Figure  3.  Dependence  of  the  reduced 
critical  molar  volume  on  the  reduced 
relativity  temperature.  Note  that 

\rit  '  -B(Tcrlt>  ' 


HAMILTONIAN  DEFORMATIONS  OF  INTEGRABLE, 
NONLINEAR  FIELD  EQUATIONS 
(WITH  APPLICATIONS  TO  OPTICAL  FIBERS) 


C.  R.  Menyuk,*'*  P.  K.  A.  Wai,*  H.  H.  Chen,*  and  Y.  C.  Lee*-6 


ABSTRACT.  In  integr&ble,  nonlinear  systems  an  arbitrarily  shaped  initial 
pulse  is  known  to  break  up  into  a  series  of  solitons  and  a  dispersive  wave  com¬ 
ponent.  It  has  been  shown  both  analytically  and  numerically  that  this  behavior 
persists  when  substantial  Hamiltonian  deformations,  which  destroy  the  system’s 
lntegrability  are  present.  By  contrast,  this  behavior  is  usually  destroyed  by  non- 
Hamiltonian  deformations  even  when  they  are  quite  small.  Hence,  it  is  usually 
sufficient  to  know  a  deformation’s  character  to  immediately  determine  its  effect  on 
solitons.  Application  of  this  result  to  optical  fiber  communication  is  discussed. 


I.  INTEGRABLE  EQUATIONS.  It  may  seem  odd  at  first  that  anything 
which  sounds  as  esoteric  as  Hamiltonian  deformations  could  have  something  useful 
to  tell  us  about  optical  fibers.  We  believe,  however,  that  our  results  are  a  nice 
example  of  how  a  physical/mathematical  principle  when  properly  understood  can 
lead  to  important  insights  into  the  operation  of  real-world  devices. 

Many,  if  not  most,  physical  systems  exhibit  turbulent  or  chaotic  behavior  in 
at  least  some  regimes.  Such  systems  are  appropriately  modeled  by  equations  like 
the  Navier-Stokes  equation  which  has  turbulent  solutions  at  high  Reynolds  numbers 
and  is  used  to  study  fluids.  In  many  important  cases,  however,  the  physical  systems 
exhibit  nice,  coherent  behavior  over  a  wide  range  of  parameters.  That  is  particularly 
the  case  in  devices  which  are  useful  for  something,  as  opposed  to  systems  which  are 
handed  to  us  by  nature,  since  one  usually  wants  the  device  to  behave  in  a  nice, 
predictable  manner. 

Nonlinear  field  equations  which  always  exhibit  coherent  behavior  include  the 
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22102 
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*  Center  for  Nonlinear  Studies,  Los  Alamos  Scientific  Laboratory,  MS-258,  Los 
Alamos,  NM  87545 
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sine-Gordon  equation 


u.t  =  sin  u,  (1) 

which  has  been  used  to  model  self-induced  transparency  [1],  the  Korteweg-de  Vries 
equation 

ut  -  6uu*  -I-  uxxx  =  0,  (2) 

which  has  been  used  to  model  water  waves  in  shallow  channels  [2j  and  ion-acoustic 
waves  in  plasmas  [3] ,  and  the  nonlinear  Schrddinger  equation 

JUt  +  ^Uxx  +  |u|2u  =  0,  (3) 

which  has  been  used  to  model  Langmuir  waves  in  plasmas  [4]  and  light  pulses  in 
optical  fibers  [5].  We  will  be  discussing  this  last  application  in  far  more  detail  at  a 
later  point. 

These  equations  are  often  referred  to  as  “integrable.”  That  is  to  say,  they  have 
a  number  of  special  properties  which  the  vast  majority  of  field  equations  do  not  have. 
Among  the  most  important  of  these  properties  is  a  spectral  transformation  which 
can  be  considered  to  be  a  nonlinear  Fourier  transform.  The  spectral  transform 
can  be  used  to  solve  these  special  equations  just  like  the  usual  Fourier  transform 
can  be  used  to  solve  linear  field  equations.  The  transformation  procedure  is  shown 
schematically  in  Fig.  1  for  the  nonlinear  Schrodinger  equation,  assuming  that  the 
initial  data  u(x,t  =  0)  falls  off  sufficiently  rapidly  as  x  -+•  ±oo  [6].  The  spectral 
transformation  yields  (r(£,0),fy(0),Cy(0)].  The  quantity  r(£, 0)  depends  continu¬ 
ously  on  the  variable  £  and  is  directly  analogous  to  the  usual  Fourier  transform, 
although  it  is  not  identical.  Physically,  it  corresponds  to  a  dispersive  wave  whose 
amplitude  vanishes  as  t  — »  oo.  In  addition,  the  spectral  transform  yields  a  number 
N  >  0  of  discrete  pairs  (fy  ,  Cy)  which  have  no  analogy  in  the  usual  Fourier  trans¬ 
form.  These  pairs  correspond  to  solitons,  nonlinear  wave  paickets  which  propagate 
without  dispersing. 

Knowing  the  solution  at  f  =  0,  it  is  possible  to  immediatelv  write  down  the 
solution  at  t  =  r.  It  is  [6] 

r(^T)  =  r(^,0)  exp(2i^2r), 

?j(r)  =  ?y(0)»  (4) 

Cy(r)  =  Cy(0)  exp(2tf/r). 

One  can  use  the  inverse  spectral  transform  to  determine  u(£,r).  The  significance  of 
the  spectral  transform  is  that  it  allows  us  to  determine  u(£,r)  in  three  steps,  shown 
as  solid  lines  in  Fig.  1,  no  matter  what  the  size  of  r.  If  one  were  to  use  the  direct 
route  shown  as  a  dashed  line  in  Fig.  1,  one  would  in  general  need  to  cut  the  time 
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FIGURE  1.  Schematic  illustration  of  the  way  in  which  the  spectral  transform  and 

Its  inverse  can  be  used  to  solve  the  nonlinear  Schrodinger  equation. 

axis  into  a  number  of  pieces  proportional  to  r  and  determine  the  solution  iteratively. 
Thus,  there  exists  some  time  r  beyond  which  the  indirect  approach  always  wins. 
While  this  indirect  approach  has  not  been  used  much  to  date,  there  are  a  number 
of  problems  where  it  would  be  useful. 

The  issue  that  we  are  concerned  with  in  this  presentation  is  that  there  are 
many  systems  which  behave  integrably.  That  is  to  say,  initial  data  breaks  up  into 
a  dispersive  wave  component  and  a  number  of  solitons  (or,  more  precisely,  solitary 
waves).  It  is  natural  to  suppose  that  these  systems  can  be  well-modelled  by  one  of 
the  Integrable  equations.  If  the  actual  system  were  to  be  perturbed  away  from,  the 
integrable  system  by  an  amount  of  order  e,  then  one  might  expect  that  the  inte¬ 
grable  behavior  would  only  appear  for  a  time  of  order  e-1.  On  a  longer  time  scale, 
solitons  would  be  destroyed.  This  expectation  is  borne  out  in  practice  when  the 
perturbations  are  dissipative  or  have  an  explicit  space  or  time  dependence;  however, 
when  the  perturbations  are  Hamiltonian  and  independent  of  space  and  time,  that 
is  no  longer  the  case.  Indeed,  the  systems  appear  to  act  integrably  on  arbitrarily 
long  time  scales.  Moreover,  they  continue  to  act  integrably  when  the  Hamiltonian 
deviations  are  so  large  that  they  can  no  longer  be  referred  to  as  perturbations,  but 
must  be  considered  substantial  deformations.  Why  are  systems  so  rugged  under 
the  Influence  of  Hamiltonian  deformations,  and  what  are  the  implications  for  prac¬ 
tical  devices  like  optical  fibers?  We  will  be  addressing  these  issues  in  the  following 
sections. 

While  this  sort  of  behavior  can  be  seen  in  a  large  number  of  real  physical 
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FIGURE  2.  Schematic  illustration  of  shape  renormalisation.  Under  the  influence 
of  a  Hamiltonian  perturbation,  a  soli  ton’s  shape  will  change  from  that  shown  as 
a  solid  line  to  that  shown  as  a  dashed  line. 

systems,  we  mention  here  two  numerical  examples  closely  related  to  the  nonlinear 
Schrodinger  equation 

*'«*  +  +  |u|3u  =  -{1  -  |u|2  - exp(— |u|2)}u,  (5) 

which  has  been  used  to  model  Langmuir  waves  in  plasmas  [7]  and 

*'u*  +  +  |«|2u  =  t'0ur„,  (6) 

which  has  been  used  to  study  light  pulses  in  optical  fibers  near  the  zero-dispersion 
point  [8,9].  The  deformations  in  both  Eqs.  (5)  and  (6)  are  Hamiltonian.  We  em¬ 
phasize  numerical  results  because  in  simulated  systems  the  effect  of  dissipation  can 
be  completely  eliminated  which  can  never  be  the  case  in  real  systems.  In  numerical 
solutions  of  Eq.  (5),  initial  data  are  seen  to  break  into  a  soliton  and  dispersive  waves 
when  |u|  is  as  large  as  2,  so  that  the  term  on  the  right  is  making  a  large  contribution. 
Similar  results  are  found  when  Eq.  (6)  is  solved  with  0  arbitrarily  large.  Clearly, 
then,  the  right-hand  side  can  be  a  large  deformation  indeed!  It  should  be  noted 
that  while  solitona  continue  to  exist,  their  shapes  and  frequency  shifts  are  observed 
to  change  from  what  is  predicted  by  the  nonlinear  Schrodinger  equation,  as  shown 
schematically  in  Fig.  2,  something  which  must  be  explained  theoretically. 


II.  HAMILTONIAN  SYSTEMS.  In  order  to  demonstrate  that  Eqs.  (5) 
and  (6)  are  Hamiltonian,  it  is  sufficient  to  show  that  they  can  be  derived  from  a 
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Hamiltonian  functional.  In  the  case  of  Eq.  (6),  this  functional  is 


h  =  - j  /  dx  [«,*;  -  |u|4  +  *7J(u*tt«  -  «„«;)] . 


Letting  q  =  u  and  p  =  u*,  one  can  show 

.  6H 


9- 


5p’ 


P  = 


6H_ 

6q' 


(7) 


(8) 


as  is  appropriate  for  Hamiltonian  systems,  where  the  derivatives  6z/6y  are  func¬ 
tional  derivatives.  A  similar  result  can  be  obtained  for  Eq.  (5).  Such  systems  are 
often  referred  to  as  infinite-dimensional  Hamiltonian  systems  because  each  point  in 
x  can  be  considered  a  separate  degree-of-freedom.  When  one  states  that  a  Hamil¬ 
tonian  system  is  integrable,  one  generally  means  that  a  canonical  transformation 
exists  which  yields  a  Hamiltonian  independent  of  the  new  coordinates,  depending 
only  on  the  new  momenta.  This  point  of  view  seems  different  from  that  of  the 
previous  section  where  we  said  that  the  nonlinear  Schrodinger  equation  could  be 
teheed  bf  making  a  spectral  transformation;  in  fact,  these  two  points  of  view  are 
equivalent.  The  spectral  transformation  turns  out  to  be  a  canonical  transformation 
which  yields  a  Hamiltonian  only  depending  on  the  momenta. 


Before  demonstrating  this  point  explicitly,  it  is  useful  to  turn  to  a  simpler  ex¬ 
ample  to  explain  how  these  canonical  transformations  work.  They  are  important 
because  when  integrable  field  equations  with  Hamiltonian  perturbations  are  con¬ 
sidered,  it  is  possible  to  find  an  infinite  series  of  canonical  transformations  which 
eliminates  order-by-order  the  dependence  on  the  coordinates.  This  result  explains 
qualitatively  why  integrable  behavior  is  rugged  under  the  influence  of  Hamiltonian 
deformations.  (At  least  when  the  deformations  are  small!) 

The  example  we  will  consider  is  a  simple,  finite-dimensional  system 

(!>?  +  «?)•  (9) 

i 

The  canonical  transformation  (p,-,  ?,)  — ►  (P,,  <?,),  where 

Pi  =  (2  Pi) 1/2  cos  Qi,  qi  =  (2P<)  l/‘  sin  <?„  (10) 

reduces  the  Hamiltonian  to  the  desired  form 

(11) 

t 
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which  depends  only  on  the  momenta.  As  a  consequence,  the  momenta  are  constant 
in  time,  while  the  coordinates  vary  linearly.  Writing  the  equations  of  motion, 

A  =  0,  Qi  =  w»,  (12) 


we  obtain, 


Pi  =  Pi,  Ot  Qi  =  Qi,  0  +  W»t, 


(13) 


where  P,,o  and  Qito  are  constants  of  integration.  In  similar  fashion,  if  we  make  the 
transformation  u  — ►  (P(£),Q(£),  A>  Qj i »  where,  in  terms  of  the  spectral  data, 


Pit)  =  ~  ln[l  +  lr(0|2]«  <?(£)  =  arg'tf). 

7T 

P,  =  2  if,,  Q,  =  -lnC,, 


(14) 


we  find  that  the  transformed  Hamiltonian  becomes  [6] 


h  =  r  d(  [2fjp(f)] + 

J  —  oo 


(15) 


which  only  depends  on  the  momenta.  Hence,  just  as  in  the  previous  case,  the 
momenta  are  constant  in  time  while  the  coordinates  vary  linearly. 


Suppose  now  that  we  perturb  the  finite-dimensional  system  by  adding  cubic 
terms  to  the  Hamiltonian, 

H  =  ^2  y  (p?  +  9i)  +  “Pi  +  bPi9 i  +  •  ■  •  (16) 

i 

In  the  limit  where  p,-  and  9,-  are  small,  this  perturbation  only  makes  a  small  con¬ 
tribution  to  the  Hamiltonian.  As  long  as  all  the  w,  are  incommensurable,  it  is 
possible  to  find  a  canonical  transformation,  using  the  Lie  transform  method  or  the 
Poincar6-von  Zeipel  method,  which  eliminates  the  cubic  terms  at  the  expense  of 
introducing  fourth  and  higher  order  terms  [10],  i.e.  there  exists  a  transformation 
[PitQi]  (p«>9t]«  ouch  that  our  Hamiltonian  becomes 

H  =  X)  y(P?  +  9i)  +  «Pi  +  •  •  •  (17) 


We  can  then  eliminate  the  fourth  order  terms  by  making  another,  analogous  trans¬ 
formation  and  continue  in  this  fashion  order-by-order.  Physically,  this  series  of 
transformations  is  possible  because  when  q%  and  p,  are  sufficiently  small,  the  effect 
of  the  cubic  perturbations,  for  the  vast  majority  of  initial  conditions,  is  to  deform 
the  orbit  of  the  pair  (p,-,g,j  without  destroying  its  neutral  stability.  By  contrast,  a 
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FIGURE  3.  Effect  of  Hamiltonian  and  non-Hamiltonian  perturbations.  A  Hamil¬ 
tonian  perturbation  slightly  deforms  the  trajectory,  but  it  remains  neutrally  star 
ble.  A  dissipative  perturbation,  no  matter  how  small,  leads  to  a  spiral  trajectory 
which  ultimately  falls  into  the  origin. 

dissipative  perturbation,  no  matter  how  small,  will  lead  to  a  fundamental  change 
In  orbit  topology  as  shown  qualitatively  in  Fig.  3. 

A  similar  series  of  transformations  exists  for  Hamiltonian  perturbations  of  inte¬ 
grate  field  equations.  We  recall  that  the  original  transformation  u  -» (P(£),  Q(f), 
P»,Q<]  yields  quantities  which  evolve  linearly  in  time  when  ut  is  given  by  the  non¬ 
linear  Schroding^r  equation.  That  is  no  longer  the  case  once  the  equations  are 
perturbed.  However,  at  any  given  order,  the  canonical  transformations  yield  a  new 
set  of  quantities  (P(£),  Q(£),  PuQi]  which  evolve  linearly  in  time  through  the  order 
to  which  wt  art  working,  i.e. 

H()  =  Po((). 

Py  =  P>,o. 

where  P0(0.  Qo(0.  P/,o.  Qj, o,  f)((),  and  fly  are  all  constant  in  time.  Hence,  just 
as  in  the  integrable  case,  it  is  possible  to  integrate  the  equation  in  a  fixed  number 
of  steps,  as  shown  schematically  in  Fig.  4,  independent  of  the  length  of  time  r  over 
which  one  wishes  to  determine  the  solution.  Why  then  are  these  perturbed  equa¬ 
tions  not  also  considered  integrable?  The  reason  is  that  in  general  this  series  of 
transformations  is  only  convergent  for  special  choices  of  the  initial  conditions;  oth¬ 
erwise,  the  series  is  merely  asymptotic,  and  only  a  finite  number  of  transformations 
can  be  usefully  made. 

At  every  order  of  the  transformation,  one  finds  that  the  topology  of  the  solu¬ 
tion  is  unchanged;  it  still  consists  of  a  number  of  solitons  which  do  not  change  in 


$(e)  =  go(o+n(e)f, 

Qj  —  Qj, o  ■+■  f!/f» 
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FlQURK  4.  Schematic  illustration  of  the  integration  procedure  for  the  perturbed 
nonlinear  Schrodinger  equation  when  the  perturbations  are  Hamiltonian.  One 
first  makes  a  spectral  transformation  followed  by  a  series  of  canonical  transfor¬ 
mations  to  arrive  at  variables  which  evolve  linearly  in  time.  Having  calculated  the  » 
new  variables  at  the  time  r,  one  reverses  the  original  sequence  of  transformations 
to  determine  u. 

time  when  well-separated  and  a  dispersive  wave  component  [11,12].  Hamiltonian 
perturbations  lead  to  no  fundamental  changes  in  the  structure  of  the  solution,  in 
contrast  to  dissipative  perturbations. 

It  should  be  noted  that  the  results  just  described  have  only  been  demonstrated 
in  detail  when  the  underlying,  integrable  system  is  the  Korteweg-de  Vries  equation 
[11,12],  although  the  nature  of  the  derivation  makes  it  seem  clear  that  similar  results 
will  hold  when  the  underlying  system  is  the  nonlinear  Schrodinger  equation  or  any 
of  a  set  of  similar  field  equations.  We  are  presently  studying  these  systems. 


III.  OPTICAL  FIBERS.  Optical  fibers  consist  of  a  glass  core  surrounded 
by  a  glass  cladding;  the  index  of  refraction  in  the  core  is  slightly  higher  than  in  the 
cladding,  implying  that  waves  will  propagate.  Essentially,  they  are  trapped  by  total 
internal  reflection  [13]. 

If  the  core  is  sufficiently  small,  a  Sum  in  diameter  or  less,  then  only  a  single 
mode,  the  HE\\  mode,  propagates,  eliminating  intermodal  dispersion.  Nonetheless, 
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FIGURE  5.  The  effect  of  dispersion  is  illustrated  schematically.  As  pulses  prop¬ 
agate  along  the  fiber,  they  broaden,  eventually  overlapping. 

single  mode  dispersion  remains  a  serious  problem  limiting  the  (bit  rate)  x  (prop¬ 
agation  length)  values  which  can  be  attained  in  modern-day  systems.  The  way  in 
which  dispersion  limits  the  bit  rate  is  shown  in  Fig.  5.  A  train  of  pulses  is  launched 
in  the  fiber.  When  a  pulse  is  present  in  a  given  time  slot,  it  is  counted  as  a  1-bit, 
and,  when  it  is  absent,  it  is  counted  as  a  O-bit.  As  the  pulses  propagate  along  the 
fiber,  the  dispersion  causes  spreading;  after  a  long  length,  it  is  impossible  for  the 
detection  system  to  tell  whether  there  is  a  1-bit  or  a  O-bit  in  any  given  slot. 

For  any  given  length,  there  is  an  optimum  pulse  size  which  yields  the  maximum 
bit  rate  possible.  If  the  pulse  is  too  narrow  initially,  then  it  has  a  large  bandwidth 
and  spreads  very  quickly  due  to  the  dispersion.  If  the  pulse  is  too  large  initially, 
then  it  stays  too  large.  It  is  conventional  to  measure  pulse  widths  in  units  of  time.  If 
we  write  the  initial  pulse  width  as  ro,  then  the  final  pulse  width  after  going  through 
a  fiber  of  length  L  is 

A3 

r  =  r0  +  -rn 
e* 

where  A  is  the  light's  wavelength,  n  is  the  index  of  refraction  in  the  fiber,  and 
e  is  the  speed  of  light  in  a  vacuum.  The  minimum  fiber  loss  rate  is  0.2  db/km 
when  A  -  1.55jim,  from  which  we  infer  a  20  km  propagation  length  before  the 
signal  loss  becomes  severe  [5,13].  From  Eq.  (19),  we  then  infer  a  maximum  bit 
rate  of  5  Gbit/sec.  While  this  figure  is  quite  large,  the  bit  rates  in  communication 
systems  have  been  rising  roughly  exponentially  as  a  function  of  time  over  the  last 
two  centuries,  with  a  break  to  a  faster  rise  after  1950,  as  shown  in  Fig.  6.  Unless 
this  curve  magically  bends  over  in  the  near  future,  it  is  clear  that  this  bit  rate  will 
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soon  be  achieved. 


Some  years  ago,  Hasegawa  and  Tappert  (14)  suggested  using  the  Kerr  non¬ 
linearity  to  compensate  for  the  dispersion.  They  showed  that  in  the  wa\ ^length 
range  A  >  1.3/am,  the  so-called  anomalous  dispersion  regime,  light  pulses  are  well- 
described  by  the  nonlinear  Schrodinger  equation  and  that  dispersionless  propagation 
is  therefore  possible.  This  idea  has  since  been  tested  experimentally  and  found  to 
be  feasible  [15,16]. 

Significant  deformations  of  the  nonlinear  Schrodinger  equation,  both  Hamil¬ 
tonian  and  non-Hamiltonian,  can  exist  in  real  fibers.  Hamiltonian  deformations 
include  cubic  dispersion,  birefringence,  and  finite  radial  effects.  Non-Hamiltonian 
deformations  include  attenuation  and  Raman  or  Brillouin  scattering.  From  the  re¬ 
sults  of  the  previous  sections,  we  may  infer  the  following:  Hamiltonian  deformations, 
even  large  deformations,  will  have  no  adverse  effect  on  the  solitons;  their  shapes  may 
be  slightly  different  from  what  the  nonlinear  Schrodinger  equation  predicts,  but  they 
will  still  exist  and  propagate.  By  contrast,  non-Hamiltonian  deformations  are  very 
destructive  and  must  be  dealt  with  in  some  way.  The  power  of  this  result  is  that  it 
is  not  necessary  to  do  any  detailed  analysis;  it  is  only  necessary  to  determine  the 
nature  of  the  deformation,  and  one  immediately  knows  whether  it’is  likely  to  cause 
trouble. 

In  order  to  verify  these  theoretical  considerations  and  to  determine  the  max¬ 
imum  deformations  which  will  still  allow  solitons  to  propagate  in  real  fibers,  our 
group  has  in  the  past  year  mounted  a  systematic  numerical  investigation  of  all  the 
deformations  which  can  play  a  major  role  in  optical  fibers.  We  have  begun  by 
examining  the  behavior  of  pulses  which  are  injected  at  the  zero  dispersion  point, 
A  ss  1.3/im.  At  this  point,  the  usual  quadratic  dispersion  goes  to  zero,  unveiling  the 
effect  of  the  cubic  dispersion.  Using  appropriately  normalized  variables,  one  then 

finds  . 

iu,  -  iurrr  +  |u]2u  =  0,  (20) 

where  s  represents  the  length  along  the  fiber  and  r  the  time  variation  in  the  group 
velocity  frame.  Note  that  we  have  reversed  the  roles  of  space  and  time  from  the 
"standard”  roles  seen  in  Eqs.  (1-3);  we  do  so  because  the  pulses  in  fibers  are  initially 
specified  for  all  time  at  a  given  point  in  space,  rather  than  the  reverse.  We  can  obtain 
Eq.  (20)  from  Eq.  (6)  by  letting  a  =  t,r  =  x//?1/8,  and  by  letting  0  -»  oo. 

It  is  of  great  practical  interest  to  operate  as  close  to  the  zero  dispersion  point 
as  possible.  Since  the  dispersion  is  minimal  at  this  point,  the  power  needed  to 
generate  a  soliton  is  also  minimal.  Indeed,  it  may  be  possible  to  reduce  the  power 
requirement  to  the  point  where  a  single  laser  diode  can  generate  the  pulses — a  very 
desirable  result  indeed!  Previous  workers  had  assumed  that  pulses  launched  at 
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DATE  RATE  (BITS  PER  SECOND) 


FIGURE  6.  Bit  nt«  in  communication  system s  ovtr  th«  last  two  centuries.  Note 
the  bresk  in  the  curve  which  occured  about  thirty  years  ago,  roughly  coincident 
with  the  Invention  of  the  laser. 
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FIGURE  7.  Evolution  of  an  initially  Gaussian  pulse  as  it  propagates  along  an 
optical  fiber. 

the  zero  dispersion  point  would  simply  break  apart  rather  than  generate  a  soliton. 
Having  noted  however  that  Eq.  (20)  can  be  generated  by  a  Hamiltonian  deformation 
of  the  nonlinear  Schrodinger  equation,  albeit  infinite,  and  motivated  by  the  results 
which  have  been  presented  in  the  previous  two  sections,  we  decided  to  examine  this 
question  more  closely.  Shown  in  Fig.  7  is  the  evolution  of  an  initially  Gaussian  pulse 
injected  at  the  zero  dispersion  point.  At  large  distances,  a  soliton  can  clearly  be 
seen  emerging!  To  verify  the  existence  of  a  soliton,  we  have  looked  for  stationary 
solutions  of  the  form 

u(a,r)  =  u(r  -  s/v)  exp(t'fls),  (21) 

which  converts  Eq.  (20)  from  a  partial  differential  equation  into  an  ordinary  differ¬ 
ential  equation.  We  have  found  these  stationary  solutions  and  checked  that  they 
satisfy  the  original  partial  differential  equation.  We  have  also  found  that  the  center 
frequency  of  the  solitons  is  shifted  down  from  the  zero  dispersion  point  into  the 
anomalous  dispersion  regime.  From  a  physical  standpoint,  we  might  say  that  the 
initial  pulse  has  adjusted  its  frequency  in  order  to  minimize  the  effect  of  the  defor¬ 
mation.  Hamiltonian  deformations  are  benign  because  of  the  ability  which  pubes 
have  to  adjust  to  them. 


IV.  CONCLUSION.  Solitons  persbt  in  the  face  of  large  Hamiltonian  defor¬ 
mations  while  non-Hamiltonian  deformations  usually  destroy  them.  Thb  result  has 
important  technical  implications  for  light  propagation  in  optical  fibers.  By  simply 
determining  whether  a  deformation  b  Hamiltonian  or  non-Hamiltonian,  we  can  im¬ 
mediately  tell  whether  or  not  it  b  likely  to  cause  trouble.  Since  thb  result  b  quite 
general,  it  b  likely  to  be  of  importance  not  only  in  fibers,  but  in  many  other  physical 
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systems  as  well. 
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Abstract 

As  is  well  known,  extremely  energetic  explosions  are  capable  of  generating 
intense  electric  and  magnetic  fields.  When  these  explosions  occur  near  a  good 
electrical  conductor,  such  as  the  Earth,  the  behaviour  of  the  electric  and  mag¬ 
netic  fields  is  different  to  that  obtained  from  isolated  explosions.  In  particular, 
the  continuity  of  the  tangential  components  of  the  quasi-static  electric  field  be¬ 
tween  the  air  and  the  ground  requires  the  vanishing  of  the  electric  field  along 
the  surface  of  the  Earth.  In  previous  studies,  this  boundary  condition  has  been 
held  to  imply  that  only  electric  fields  whose  angular  dependence  is  given  by 
odd  spherical  harmonics  contribute  to  the  total  field  above  the  ground. 

In  this  work,  it  is  shown  that  solutions  exist  to  Maxwell’s  equations  which 
satisfy  the  boundary  conditions  at  the  Earth’s  surface  for  the  even  spherical 
harmonics  and  which  are  not  zero  throughout  all  space. 

Maxwell’s  equations  for  the  fields  are  solved  numerically,  and  results  pre¬ 
sented  which  indicate  that  the  contribution  of  these  fields  to  the  total  electric 
field  may  be  significant  at  certain  angles. 


1  Introduction 

It  is  well  known  that  extremely  energetic  chemical  explosions  can  produce  electric 
and  magnetic,  signals  of  appreciable  magnitude  at  considerable  distances  from  the 
location  of  the  explosion  (Glasstone  and  Dolan  1977).  These  fields  seem  to  be 
generated  by  two  distinct  mechanisms:  the  compression  of  magnetic  flux  within  the 
ionised  gases  at  accelerating  shock  fronts  (Wilhelm  1984,  1983)  and  by  the  dust 
cloud  formed  by  the  explosion  (Bacon  and  Cherin  1984). 

However,  as  one  might  have  expected  from  the  larger  energies  involved  in  nuclear 
explosions,  the  electromagnetic  fields  produced  in  these  cases  are  of  proportionately 
greater  magnitudes.  These  are  generated  by  electric  currents  caused  by  Compton 
scattering  of  electrons  by  X-  and  '■y-rays  from  the  nuclear  explosion.  The  fields 
caused  by  nuclear  explosions  are  generally  known  as  electromagnetic  pulses  (EMP) 
(e.  g.  Longmire  and  Gilbert  1980,  Longmire  1978).  In  the  case  of  the  nuclear  explo¬ 
sions,  the  fields  which  are  not  generated  by  Compton  scattering  can  be  significant 
only  at  very  late  times. 
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For  the  case  of  chemical  explosions,  the  dust  induced  electromagnetic  noise 
(DIEMN)  is  capable,  at  the  least,  of  interfering  significantly  with  radio  and  televi¬ 
sion  broadcasts.  In  the  case  of  nuclear  explosions,  the  fields  generated  can  possess 
field  strengths  of  several  kV/m  over  kilometre  distance  scales  and  time  scales  of 
milliseconds.  In  the  Johnston  Island  test  of  1962,  the  fields  created  by  a  nuclear 
explosion  seem  to  have  caused  current  surges  in  electrical  equipment  of  sufficient 
magnitude  to  have  triggered  fuses  in  the  street  lighting  system  in  Honolulu  some 
800  miles  distant  (Glasstonc  and  Dolan  1977).  Another  effect  of  interest  associated 
with  nuclear  explosions  is  the  presence  of  lightning  flashes  at  times  of  up  to  1  second 
after  the  explosion  (Wyatt  1980,  Uman  et  al  1972).  These  flashes  are  presumed  to 
have  been  produced  by  the  dielectric  breakdown  of  the  air  by  the  electric  fields 
generated  by  KM  I*. 

Extensive  work  has  been  done  in  the  past  few  years  on  the  theoretical  calculation 
of  KM  I*  effects  at  various  stages  of  the  explosion.  Particular  interest  has  been  paid 
to  the  EMP  generated  by  an  explosion  close  to  the  surface  of  the  Karth,  especially 
during  the  so-called  quasi-static  phase  in  which  the  rate  of  change  with  respect  to 
time  of  the  electric  ami  magnetic  fields  is  sufficiently  slow  that  it  may  be  neglected 
in  Maxwell’s  equations.  It  is  well  known  that  an  electric  field  must  vanish  within 
a  perfect  conductor.  In  the  region  over  which  the  Earth  can  be  considered  to  be 
a  perfect  conductor,  the  quasi-static  EMP  field  at  the  surface  of  the  Earth  should 
be  zero.  This  boundary  condition  is  automatically  satisfied  by  odd  multipoles  of 
the  electric  field.  From  this  condition,  it  has  generally  been  assumed  that  the 
quasi-static  electric  field  produced  by  a  near  surface  blast  can  consist  only  of  odd 
multipoles  of  the  field  throughout  all  space,  (e.  g.  Downey  1983,  Grover  1980) 

In  this  paper,  it  is  shown  that  the  condition  that  the  quasi-static  electric  field 
vanish  along  the  surface  of  the  Earth  does  not  imply  that  the  even  multipoles  of  the 
field  can  not  exist,  and  further,  that  these  fields  can  be  of  appreciable  magnitudes. 
At  the  surface  of  the  Earth,  these  even  multipole  fields  can  induce  a  surface  charge 
density  which  counteracts  the  radial  field  there  and  hence  satisfy  the  boundary 
conditions.  At  locations  other  than  the  surface  of  the  Earth,  this  cancellation  is  not 
complete,  leaving  a  finite  field  composed  of  the  sums  of  the  original  even  multipole 
field  and  the  fields  produced  by  the  surface  charge  induced  at  the  Earth’s  surface. 
Sample  calculations  for  estimates  of  typical  field  strengths  are  presented. 

2  Maxwell’s  Equations  for  the  Quasi-static  Phase 
of  EMP 

The  two  relevant  time  dependent  Maxwell  equations  are 


where  B  is  the  magnetic  intensity  in  webers/m2,  E  the  electric  field  in  volts/rn.  ./ 
the  current  density  in  amps/m2,  t  the  dielectric  permittivity  in  faradays/rn,  and  p 
the  magnetic  permeability  in  henrys/m.  Throughout  the  course  of  this  paper,  we 
shall  be  concerned  only  with  the  calculation  of  the  fields  in  air,  and  hence  <  and  p 
will  be  assumed  to  take  their  free  space  values,  t0  and  p0. 

Assuming  that  the  fields  are  evaluated  at  times  late  enough  that  the  fields  are 
nearly  constant  in  time,  eqs.  (1)  and  (2)  become 

V  x  E  -  0,  (3) 

V  x  B  =  p0 J,  (4) 

in  air. 

Now,  the  current  density  J  can  be  divided  into  two  parts,  the  source  current 
J,  ,  and  the  conduction  current  Je.  The  source  current  arises  from  the  ionisation 
created  by  the  explosion;  its  exact  form  depends  on  whatever  the  dominant  ioni¬ 
sation  mechanism  is  at  the  time  the  fields  are  evaluated.  For  chemical  explosions, 
this  can  be  the  ionisation  created  by  the  shock  or  collisions  with  dust  particles.  In 
nuclear  explosions,  J,  is  created  primarily  by  Compton  scattering  of  the  electrons 
in  the  air  by  *7-  and  X-rays.  Since  to  a  good  approximation  Ohm’s  law  is  obeyed  in 
air,  one  can  write 

J  =  j„  +  oE  (5) 

where  Je  -  oE  and  the  conductivity  a  is  measured  in  l/(ohms-m).  In  air,  o  depends 
upon  the  value  of  E  (e.  g.  Lee  1980,  Longmire  and  Gilbert  1978).  However,  up  to 
fields  of  strength  ~  100  kV/m,  this  dependence  is  small  and  can  be  neglected. 

After  substituting  eq.  (5)  into  eq.  (4)  and  taking  the  divergence,  one  obtains 

-  V  •  [oE)  =  V  •  J,.  (6) 

In  order  to  satisfy  eq.  (3),  the  electric  field  must  be  derivable  from  a  potential, 
thusly: 

(7) 

where 

OO  OO 

*=  E  Ews;imi.  (») 

m=  -  00  r»-0 

In  eq,  (8),  Z™  is  the  function  containing  the  radial  dependence  of  the  poten¬ 
tial  associated  with  each  surface  spherical  harmonic,  S"'  .  The  surface  spherical 
harmonics  of  angular  order  m  and  rank  n  are  defined  by 

S:(M)  =  C<«  osCJe'"*  (9) 

where  the  associated  Legendre  functions,  P^1,  are  given  by 
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and  the  Legendre  polynomials,  Pn  ,  by 


Pn(  COS0)  = 


(-1)"  dn{ sm2n0) 
2nn!  d(cos0)n 


(11) 


r,  0  ,and  <f>  the  standard  spherical  polar  co-ordinates  with  the  origin  located  at  the 
site  of  the  original  explosion,  as  shown  in  Fig.  1. 

The  substitution  of  eqs.  (7)  -  (11)  into  eq.  (6)  yields,  after  considerable  simpli¬ 
fication, 


*■1=  E  E 


d2Z 


m—~  oo  n  -0 


dr2 


,2o  do.dZ™  .o 

+  <T  +  *>77  -  +  *>7Z» 


«T(M).  (12) 


It  is  assumed  that  the  divergence  of  J,  ,  the  source  current  density,  can  be  expressed 
by 

00  00 

V-J,=  £  ECM  «T(M  (13) 

m— -oo  n=0 

where  F™  is  the  function  containing  the  radial  dependence  of  the  divergence  of  the 
source  current  density  associated  with  each  surface  spherical  harmonic,  S™  ;  the 
conductivity,  a  ,  is  assumed  to  be  a  function  of  r  only.  In  fact,  the  conductivity 
exhibits  a  weak  dependence  on  things  like  local  field  strength,  angle,  and  water 
vapour  content  of  the  air.  The  assumption  that  the  conductivity  or  is  a  function 
only  of  r  seems  to  be  adequate  at  late  times,  at  least  as  a  first  approximation 
(Grover  1980). 

Multiplying  eq.  (12)  by  the  spherical  harmonic  S%'  ,  and  using 


f'f  s.m(»,*)s.’!'(».0)  smededd,  =  (11) 

Jo  Jo  [In  +  1J 

one  can  separate  the  radial  functions  associated  with  each  spherical  harmonic,  and 
obtain  a  2nd  order  differential  equation  for  Z™  ,  the  radial  dependence  of  the  electric 
potential,  thusly: 


d2Z?  .2ct  do,dZ?  ,  ,  o 

<’^-  +  (7+dr>-rfr-"<’!+,»r5Z”  =F" 


(15) 


Grover  (1980)  and  others  (e.  g.  Hodgdon  1984).  have  derived  similar  equations, 
with  the  important  difference  that  the  summations  in  eqs.  (12)  -  (13)  were  taken  over 
only  the  odd  values  of  n.  This  was  done  in  order  to  satisfy  the  boundary  condition 
that  the  radial  component,  Fr  ,  of  the  electric  field  must  vanish  identically  over  the 
surface  of  the  Earth.  However,  as  can  be  seen,  if  F”  is  not  identically  zero  for  all 
even  n  ,  this  ignores  those  multipoles  excited  by  those  current  densities  with  even 
values  of  n.  Since,  in  fact,  the  even  multipoles  of  J,  are  not  all  zero,  another  way 
of  satisfying  the  boundary  conditions  must  exist. 
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The  boundary  conditions  on  the  fields  at  the  surface  of  the  Earth,  assuming  it 
to  be  an  infinite  plane  located  at  6  -  90°,  are: 


n  x  (E2  -  Ei)  =  0, 

(16) 

h  ■  (D2  -  D 1)  =  to, 

(17) 

n  •  {B2  -  By)  =  0, 

(18) 

fix  {H2-  Hy)  =  K, 

(19) 

(Jackson  1962,  Stratton  1941).  In  eqs.  (16)  -  (19),  the  variables  with  subscript  1 
refer  to  the  Earth,  and  those  with  with  subscript  2  refer  to  the  air.  As  before, 

— *  — 4 

E  is  the  electric  field,  and  B  the  magnetic  induction.  As  well,  D  is  the  electric 

— • 

displacement,  and  H  the  magnetic  field,  uj  is  the  surface  charge  density  on  the 
Earth,  K  the  surface  current  density,  and  h  the  outward  unit  normal  to  the  surface 
of  the  Earth.  In  this  paper,  we  shall  make  use  only  of  eqs.  (16)  -  (17),  but  eqs.  (18)  - 
(19)  are  included  for  the  sake  of  completeness. 

At  this  stage,  it  will  be  assumed  that  all  physical  processes  involved  in  the 
explosion  and  the  field  are  symmetrical  with  respect  to  the  x  —  y  plane  and  hence 
that  the  resulting  fields  are  independent  of  the  <f>  co-ordinate.  This  implies  that  the 
angular  order  m  of  the  surface  spherical  harmonics  in  eqs.  (8)  -  (15)  is  always  0, 
and  hence  that  one  is  left  only  with  a  summation  over  the  rank  n. 

If  the  Earth  is  assumed  to  be  a  perfect  condurtor,  the  electric  fields  must  vanish 
within  it.  Hodgdon  (1984)  has  pointed  out  that  sufficiently  close  to  an  explosion, 
the  conductivity  of  the  air  first  approaches  and  then  surpasses  that  of  the  Earth. 
Bearing  in  mind,  then,  that  these  boundary  conditions  can  only  be  said  to  apply  to 
that  part  of  the  Earth  in  which  the  conductivity  is  at  least  an  order  of  magnitude 
greater  than  that  in  the  air,  eqs.  (16)  -  (17)  become: 

h  x  E2  =  0,  (20) 

n  •  D2  —  E7.  (21) 

For  simplicity,  it  will  nonetheless  be  assumed  that  the  boundary  conditions,  eqs.  (20)  - 
(21),  apply  on  the  whole  surface  of  the  Earth.  In  the  numerical  calculations,  this 
simply  means  that  one  must  confine  oneself  to  region  in  which  this  applies.  Inci¬ 
dentally,  for  explosions  over  sea  water,  the  surface  of  the  Earth  can  be  considered 
to  be  a  perfect  conductor  much  closer  to  the  explosion  site  than  would  be  the  case 
for  an  explosion  over  soil. 

The  outward  normal  to  the  surface  of  the  Earth  is  the  unit  vector  along  the  z 
axis.  Using  this,  and  substituting  eqs.  (7)  -  (8)  into  eqs.  (20)  -  (21),  one  obtains 

00  (  d7°  7°  HP  \ 

Z  ( sin0-TJiPn(cos  0)  +  cos  0  ~  ~~jzr  (cos  0)  I  =0  (22) 

n-o  V  dr  r  de  )  #=90O 

and 

*>) 


OO  /  jyO  7O  Jp 

V  (cos  9—1 -Pn(cos  0)  —  sin  0-— (cos 
_ n  \  dr  r  dO 
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as  boundary  conditions.  The  summations  over  the  odd  Legendre  polynomials  Pn 
vanish,  leaving  only  the  summations  over  the  even  polynomials  to  be  satisfied. 
The  usual  practice  (Hodgdon  1984,  Grover  1980)  has  been  to  satisfy  the  boundary 

dZ°  , 

condition  by  insisting  —  Z°  —  0  throughout  all  space  for  the  even  spherical 
harmonics.  As  was  indicated  above,  this  seems  unlikely  if  the  current  density  de¬ 
pends  to  some  degree  upon  the  even  spherical  harmonics.  What  seems  more  likely 
is  that  a  the  non-zero  field  at  the  surface  of  the  Earth  draws  charges  there  which 
arrange  themselves  in  such  a  fashion  so  as  to  cancel  the  inducing  field  at  the  surface, 
but  not  necessarily  throughout  all  space. 

Let  the  total  potential,  4>r  ,  throughout  all  space  be  given  by 

$7  <J>  +  $i  (24) 

where  4>  is  the  potential  as  given  by  eqs.  (8)  and  (12),  caused  by  the  source  and 
conduction  currents,  and  <1>]  is  the  potential  induced  by  the  surface  charge  density 
to  to  counteract  4>  at  the  surface  of  the  Earth.  Substituting  eq.  (24)  into  eqs.  (16)  - 
(17),  one  obtains 

d<J>, 

1  /  (94>i  (9<J>  \  to 

r  \  dO  dO  )  |,c90.  (o 

Since  eq.  (25)  is  true  over  the  w hole  0  90  plane  ,  one  can  integrate  eq.  (25)  over 

the  surface  of  the  Earth  to  obtain 

<M/>)  *(/>)  +  C  (27) 

where  4>i(p)  is  the  potential  along  the  Earth  and  C  is  a  constant  of  integration. 
Since  at  great  distances  from  the  initial  explosion,  the  potential  caused  by  it  must 
drop  to  0,  C  -  0. 

Evidently, 

<J>)(p,z)=/  f{k)e~kzJ0(kp)dk  (28) 

Jo 

where  $i(p,  z)  is  now  the  potential  throughout  all  space  due  to  the  surface  charge 
induced  on  the  Earth,  as  a  function  of  the  cylindrical  co-ordinates  p  and  z  centred 
at  r  =  0,  Jo(kp)  is  the  Oth  order  Bessel  function  and  f(k)  is  an  unknown  function 
to  be  determined  from  the  boundary  conditions  (Jackson  1962).  Multiplying  both 
sides  of  eq.  (28)  by  pj0(k'p)  and  integrating  with  respect  to  p  from  0  to  oo  ,  one 
obtains 


(26) 


d* 

dr 


1 0-90° 


(25) 


Setting  2  =-  0  in  eq  (29),  one  can  evaluate  the  right  hand  integral  to  obtain  an 
expression  for  f(k): 

f(k)=  f  kp'$i(p')  Jo(kp')  dp1. 

Jo 

Substituting  eq.  (30)  into  eq.  (28)  one  obtains  finally  that 

(p,z)--=J  J  ke~ktJo(kp)p,^i(p,)Jo(kp')dp,dk.  (31) 

Since  the  potential  *,(p)  on  the  surface  of  the  Earth  is  known  from  eq  (27)  in 
principle,  eq.  (31)  provides  a  means  of  calculating  the  potential  resulting  from  e 
surface  charge  induced  on  the  Earth  to  cancel  the  field  there.  Using  the  senes 
expansion  for  the  0th  order  Bessel  function, 


°°  /  1 

Jo{kp')  =  5Z(~1)  221  /!  T(l  +  1) 


*  V2' 


(32) 


eq.  (31)  becomes 

_  fV D- _ I _  r  *MP^' dp'  r  k'+'JoMe-*  dk  (33) 

~ho  22t/!T(/+  l)/o  J  o 

where  T  is  the  gamma  function.  From  Gradshteyn  and  Ryzhik  (1971),  pg  711, 

[°°  arlMPz)*~atdz 

Jo 

(34) 


=  (a2  +  l3iy^T(u  +  p)P-!:i 


T  » 


(a2  4-  02) s 
a  >  0,  P  >  0,  Re(u  +  p)  >  0 

where  is  the  uth  order  Bessel  function.  As  can  be  seen,  eq.  (33)  may  be  evaluated 
by  the  use  of  eq.  (34),  with  a  =  2,  0  -  p,  p.  =  2/  +  2,  u  =  0.  Hence, 


f  ‘W" dk  -  + fcrr? 


.  A  I  ’ 

)’ 


(35) 


2  >  0,  p  >  0. 


By  definition,  , 

r  =  (2J  +  p2)5, 


cos  6  = 


2 


r(n  +  1)  =  n!,  n  6  /, 


(36) 
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and  so 


(37) 


*iM)  =  nE(-') 


/  =  0 


/  (2/  +  1)!  P2i+l{cos6) 

22i(/!)2 


r2  l 


>1/ 


where 


>•  oo 

A,=  [  *,(pV  >' 

Jo 


(38) 


It  should  be  noted  that  eq.  (37)  is  not  valid  for  z  —  0  (i.  e.  6  -  90°  ).  However,  the 
potential  and  radial  field  are  known  on  that  plane  since  they  must  exactly  counteract 
those  produced  by  the  source  and  conduction  currents.  In  principle,  then,  the  fields 
caused  by  the  surface  charge  density  on  the  Earth  are  known. 

To  sum  up:  in  this  section  we  have  derived  the  equations  governing  the  electric 
fields  induced  by  electric  currents  in  the  atmosphere  from  explosions  of  various 
types.  We  have  shown  how  the  fields  may  be  decomposed  into  multipole  fields 
and  that  where  the  source  and  conduction  currents  are  dependent  upon  particular 
multipoles,  electric  fields  which  are  depend  on  those  multipoles  are  created.  From 
this  it  follows  that  in  general,  both  even  and  odd  multipole  fields  exist  as  a  result 
of  an  explosion. 

Where  the  conductivity  of  the  Earth  is  sufficiently  high  that  it  may  be  consid¬ 
ered  a  perfect  conductor  with  respect  to  the  air,  the  boundary  condition  on  the  field 
requires  that  the  component  of  the  field  along  the  ground  must  vanish.  For  the  odd 
multipoles  of  the  field,  this  condition  is  satisfied  automatically.  For  the  even  mul¬ 
tipoles,  it  is  satisfied  by  the  appearance  of  a  surface  charge  density  which  produces 
a  field  which  counteracts  the  original  field  at  the  surface  of  the  Earth.  However, 
the  field  which  results  from  the  sum  of  these  two  fields  need  not  be  zero  everywhere 
else,  and  hence  even  the  multipole  fields  can  contribute  to  the  total  field. 


3  Numerical  Methods 

Before  one  attempts  numerical  solutions  of  the  field  equations,  eq.  (15),  it  is  nec¬ 
essary  to  know  the  conductivity  a  ,  and  the  source  currents  J,.  Both  of  these 
depend  upon  the  precise  nature  cf  the  ionisation  process.  Since  the  most  inter¬ 
esting  cases  from  a  theoretical  standpoint  occur  when  the  fields  are  produced  by 
a  nuclear  explosion,  it  was  decided  to  choose  expressions  for  o  and  J,  appropri¬ 
ate  to  a  thermonuclear  explosion.  Hence,  at  this  point  the  further  development  of 
the  field  equations  will  be  confined  to  the  specific  case  of  the  fields  generated  by  a 
thermonuclear  explosion. 

The  total  atmospheric  conductivity  is  composed  of  two  parts:  an  ionic  and  an 
electronic  conductivity.  Each  Compton  recoil  electron  produces  about  about  thirty 
thousand  pairs  of  ion-electron  pairs.  At  early  times,  the  electronic  conductivity 
dominates;  at  late  times,  the  ionic  dominates.  The  expression  for  the  total  conduc¬ 
tivity  is  hence 

a  —  at  -f  oj  (39) 
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(40) 


where  oe  ,  the  electronic  conductivity,  is  given  by 

S 

oe  =  ene  — 
oce 

and  Oj  ,  the  ionic  conductivity,  is  given  by 


(Downey  1983,  Wyatt  1980,  Grover  1980).  In  eqs.  (40)  and  (41)  e  is  the  charge  on 
the  electron,  nt  the  electron  mobility,  5  the  local  ionisation  rate,  at  the  electron 
attachment  rate,  hi  the  ionic  mobility,  and  -7 /  the  ion-ion  recombination  rate.  The 
ionisation  rate  S  is  assumed  to  have  the  form 

O  <,  exp(-r/A) 

O  £>0  - - - - 

rl 

where  A  is  the  effective  mean  free  path  of  the  gamma  rays,  S0  is  a  constant  for 
a  given  time  and  yield,  and  r  is,  as  above,  the  radial  co-ordinate  of  a  spherical 
co-ordinate  system  centred  at  the  blast  site. 

For  convenience,  we  shall  define 


F0{t)  =  -3.9  x  10 ~22Y0Na  exp(— 8.33  x  102t), 

Go(0  =  8.2  x  10“22  Y0Na  exp(— 8.33  x  102<), 

H0[t)  =  -2.8  x  10"23y0^a  exp(-16.7t), 

F(r,t)  -  [exp(-2.65  x  10'5por)  -  exp(-1.04  x  10~V0r)]  , 

G(r,t)  -  — °2  ’  exp(— 4.61  x  10_5/>or),  (43) 

H{r,t)  —  — [exp(-2.20  x  lO-5/^)  -  exp(-4.78  x  10_5/o0r)]  , 

X[r,t)  =  F{rtt)  +  H(r,t), 

Y{r,t)  =  16 F(r,  <)  +  1.3//(r,<), 

U(r,t)=G(rj), 

V(r,t)  =  -G(r,t). 


In  terms  of  the  functions  defined  in  eq.  (43),  the  source  current  densities  are 
given  by 

Jr  =  F(r,t)(  1 -f  16cos0),  (44) 

Jg  =  G(r,t)(l  -  cos0), 

for  ground  capture  sources,  and 

Jr  =  H(r,t)(  1  +  1.3  cos  0),  (45) 

Jg  =  0, 
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for  air  capture  sources  (Downey  1983).  In  eqs.  (43)  -  (45),  Vo  is  the  total  yield  in 
kilotons,  7V„  is  the  number  of  neutrons/  kiloton,  po  is  the  air  density  in  mg/cm3,  r 
is  the  radial  distance  from  the  blast  in  metres,  0  is  the  polar  angle,  t  the  retarded 
time  in  seconds,  Jr  the  radial  current  density  in  abamps/cm2  ,  and  Jg  the  polar 
ci  •’•ent  density  in  abamps/cm2.  The  total  current  density  at  any  retarded  time  t 
must  be  the  vector  sum  of  eqs.  (44)  -  (45)  .  Hence,  the  components  of  of  the  source 
current  density  are 

Jr  =  X(r,  t)  -f  y(r,  t)  cos  6,  (46) 

Jfi  =  U(r,t)  f-  V(r,t)  cos  0.  (47) 


Therefore, 


V- J, 


(dY  Y\  „  U  cose  V  cos2  e  -  sin2  6 

^  _ + 2  _  j  c°s « +  _  __  +  _ _____ 


(48) 


Then,  the  substitution  of  eq.  (48)  into  eq.  (13),  along  with  the  use  of  eq.  (14), 
yields 

_o  dX  X 

»  f  2--, 

or  r 


~  dY  Y  3nU 
Ff-  —  +  2-4 

or  r  4  r 


_  (2 n  4  l)rr  V  .  (2n  -  2 k)\(n  -  2 k  4-  2)(n  -  2k) 
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for  n  even  ,  n  >  2  , 
it  (n  n/2 

C  =  ( 2«H  i)»-  E  (-•)* 

r  »=o 

for  n  odd  ,  n  >  3  . 


(2n  —  2k)\ 


(49) 


2!"-«fc!(n  _  t)t  (((„  _  i)/2  -  *)!)*  (n  -  2k  +  1) 


By  substituting  eq.  (49)  back  into  eq.  (15),  it  is  possible  to  solve  numerically  for  the 
radial  part  of  the  potential  ami  the  field.  It  should,  however,  be  noted  that  eq.  (49) 
must  be  converted  into  amps/m'  in  order  to  be  consistent  with  the  expression  for  the 
conductivity.  It  is  generally  ,ost  convenient  in  numerical  solutions  of  differential 
equations  to  use  scaling  factors  to  form  dimensionless  equations.  By  defining 
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where 


Q  =  LJr(^(  r„)),  (51) 

and  L  ,7  ,  M,  and  Q  are  the  scaling  factors  in  MKS  units  for  length,  time,  mass 
and  electric  charge,  respectively,  with  r0  being  the  smallest  value  of  r  that  appears 
in  the  integration,  one  is  allowed  to  specify  any  two  of  L  ,  T  ,  M  and  Q  as  free 
parameters.  It  was  found  to  be  most  convenient  to  specify  T  =  16.7  secs,  and  L 
as  twice  the  maximum  value  of  r  used  in  the  integration.  Using  the  dimensionless 
quantities  defined  above,  the  field  equation,  eq.  (15)  becomes 

dy\ 
dr' 
dy-i 
dr' 

Equations  (52)  were  solved  using  a  four-point  Runge-Kutta  algorithm  with  au¬ 
tomatic  error  controls.  It  was  necessary  to  find  initial  solutions  to  begin  the  integra¬ 
tion.  Unfortunately,  the  field  equations,  eq.  (52),  the  expression  for  the  conductivity 
a  ,  eq.  (39),  and  for  the  excitation  function  F®,  eq.  (49),  all  possess  the  unfortunate 
property  of  singularity  at  the  origin.  This  implies  that  the  expressions  used  for 
a  ,  J,  ,  and  F°  cease  to  be  applicable  close  to  the  blast  site  and  others  must  be 
used.  The  derivation  of  these,  however,  presents  considerable  problems.  Instead  of 
attempting  to  determine  initial  values  for  the  field  and  potential  near  the  blast  site, 
it  was  decided  to  use  the  fact  that  at  very  large  distances  from  the  blast,  both  field 
and  potential  must  be  zero.  Hence,  if  one  starts  the  integration  at  a  sufficiently 
great  distance  from  the  blast  site  and  integrates  inwards  to  r  —  0,  the  initial  values 
of  yj  and  yj  can  both  be  set  to  zero.  In  all  of  the  cases  examined  in  this  work,  it 
was  found  that  the  excitation  function  F%  and  the  source  current  J,  were  negligible 
((i^)‘  <  10- *°)  at  distances  of  r  =  28  kilometres.  Accordingly,  the  integration 
was  begun  at  that  point  with  yi  =  y2  =  0  and  stopped  at  r  =  .2  kilometres.  At 
that  point,  the  expressions  for  the  current  density  and  the  conductivity,  eqs.  (40) 
and  (41)  will  certainly  cease  to  apply  (Downey  1983,  Wyatt  1980).  In  point  of 
fact,  r  =  .4  kilometres  is  probably  the  limit  to  which  eqs.  (40)  and  (41)  are  even 
approximately  accurate,  but  the  integration  was  carried  out  to  r  =  .2  kilometres 
simply  for  completeness,  although  the  values  obtained  for  y\  and  yi  for  r  <  .4  km. 
are  of  questionable  accuracy. 

Two  other  numerical  problems  arose  in  the  evaluation  of  eqs.  (37)  and  (38) 
for  the  even  multipole  fields,  both  partially  caused  by  the  inapplicability  of  the 
conductivity  model  near  the  explosion  site.  At  some  point  within  a  4  km.  radius 
around  the  explosion  site,  the  conductivity  will  become  very  large  both  with  respect 
to  that  of  the  Earth  and  absolutely.  Within  that  radius,  the  potential  in  the  air 


—  f-  +  t) 

a'  \  r*  dr'  J 


y,  +  n(n  -t  1)£  + 
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must  be  constant  in  the  quasi-static  approximation  in  order  that  the  field  fall  to 
zero  there.  This  implies  that  the  potential  $j(p)  will  not  be  given  by  the  negative  of 
where  $(p)  is  a  solution  to  eqs.  (52)  with  a ’  and  (F%)’  as  given  by  eqs.  (39)  - 
(51).  Therefore,  the  true  value  of  $i(p)  in  eq.  (38)  is  effectively  unknown  at  values 
of  r  <  .4  km.  in  the  absence  of  a  model  for  a  and  Jf  near  the  explosion  site,  although 
the  continuity  requirements  on  the  fields  and  potential  place  limits  on  the  magnitude 
of  the  error.  The  problem  was  addressed  by  halting  the  integration  of  eq.  (38)  at 
r  =  .4  km.  At  worst,  the  vanishing  of  the  field  and  the  continuity  conditions  on  the 
potential  imply  that  the  error  could  be  no  worse  than  that  obtained  by  holding 
in  eq.  (38)  constant  at  its  value  at  p0  =  .4  km.  and  integrating.  That  is,  the  error 
term  for  At  ,  £/,  resulting  from  the  termination  of  the  integration  at  p  —  .4  km.  , 
should  obey  the  condition 


(53) 

Equation  (53)  was  evaluated  for  values  of  4>  arising  from  the  original  fields  for 
n  —  2  and  n  =  0  in  eqs.  (52)  and  found  to  be  at  least  3  orders  of  magnitude  smaller 
than  Ai  for  each  value  of  /  .  A  more  serious  problem  concerns  the  nature  of  the 
potential  $i(p)  in  eq.  (38).  It  is  evident  that  Ai  will  be  finite  as  r  — >  oo  only  if 
4>i(p)  decays  exponentially  as  a  function  of  p  .  Since  $i(p)  was  obtained  from  the 
numerical  solution  of  the  field  equations,  eq.  (52),  its  error  terms  can  propagate 
through  the  integral  A/  ,  coming  to  dominate  over  the  true  value,  especially  at  high 
values  of  p  .  A  related  problem  concerns  the  series,  eq.  (37).  Since  it  represents  a 
physical  quantity,  the  potential,  it  must  converge  everywhere.  However,  at  values 
of  r  near  the  origin,  it  will  diverge  because  the  values  of  J,  and  a  are  unbounded 
in  that  neighbourhood.  For  high  values  of  r  ,  the  series  will  diverge  because  of  the 
accumulation  of  numerical  errors. 

In  order  to  limit  the  influence  of  these  types  of  errors,  the  following  procedure 
was  adopted-  Instead  of  evaluating  Ai  for  each  value  of  /,  the  quantity 


,  ,  <  I  *  Ml  ;>o 

1  i|_  21  +  2 


was  evaluated  for  each  r  and  /  of  interest,  with  the  factor  r  *2,+  ,l  effectively  acting 
as  an  integrating  factor.  The  ratio 


R  -- 


(55) 


where 

C,'=  (2^5,+,(c°S#)f'i  (56) 

was  computed  at  each  value  of  r  and  0  until  R  >  1  .  Since  numerical  trials  had 
demonstrated  that  if  R  >  1  were  true  at  /  =  /0  ,  it  would  also  be  true  for  all  /  >  /0, 
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/o  ,  the  value  of  /  at  which  R  >  1  ,  was  taken  to  be  the  point  at  which  numerical 
errors  were  beginning  to  force  the  divergence  of  the  series.  The  series  was  then 
truncated  at  the  last  value  of  /  for  which  R  <  1.  Since  the  true  value  of  B'l+i  was 
assumed  to  be  less  than  B\  in  order  to  prevent  divergence,  this  implies  that  the 
value  calculated  for  <J>t  could  differ  from  the  true  value  by  no  more  than  B\u_ ,  . 
It  should  be  noted  that  this  procedure  does  not  ensure  convergence  of  the  series 
eq.  (37)  ;  it  merely  discards  those  series  that  are  felt  to  be  demonstrably  divergent. 
Unfortunately,  since  at  certain  values  of  r  and  0  ,  R  >  1  for  the  first  two  elements 
of  the  series,  it  is  not  possible  even  to  estimate  the  magnitudes  of  the  fields  and 
potential  there  from  eq  (37).  The  calculation  of  the  B\  was  done  with  Simpson’s 
second  rule  and  a  base  point  spacing  of  .025  km  for  the  fields  generated  in  reaction 
to  the  n  =  2  component  of  the  original  field  and  a  base  point  spacing  of  .05  km 
for  those  generated  in  reaction  to  the  n  =  0  component  of  the  original  field.  The 
values  of  B\  were  checked  against  numerical  error  by  halving  the  size  of  the  base 
point  spacings. 

4  Numerical  Results  and  Analysis 

Figures  2-5  show  the  potential,  total  electric  field,  radial  electric  field  and  tangential 
field  for  the  dipole  electric  field  (i.  e.  for  n  =  1  in  eq.  (15))  as  a  function  of  the  radial 
co-ordinate  r  at  various  angles  for  a  nuclear  explosion  of  10  megatons  evaluated  at 
a  retarded  time  of  1  msec  after  the  blast.  Unless  otherwise  stated,  it  will  henceforth 
be  assumed  that  all  of  the  fields  discussed  in  this  section  are  evaluated  at  the  same 
retarded  time  of  1  msec,  and  that  the  source  currents  are  those  generated  by  a  10 
megaton  thermonuclear  explosion  (i.  e.  Yq  —  104  )  in  eqs.  (43)).  One  also  needs 
to  have  values  for  S0,  p0,  Na,  ae,  p€,  -y/,  and  p/.  Following  Grover  (1980),  5n  in 
eq.  (42)  was  set  to  1.1  x  1030  ion-pairs/m-sec,  a  value  appropriate  to  a  10  megaton 
burst.  The  values  assumed  for  the  other  quantities  were  also  those  chosen  by  Grover 
(1980): 


Na  =  2.0  x  1023  neutron/kT, 
Po  —  1.225  mg/cm3, 
ae  =  1 .5  x  108  sec"1, 
pt  =  .25  (m2/V-sec), 

Tf/  —  2.0  x  10  12  m3/sec, 

Hi  =  2.5  x  10~4  (m2/V-sec). 


In  reality,  these  values  depend  upon  things  like  the  field  strength,  air  density  and 
fraction  of  water  vapour  present.  However,  the  average  values  will  suffice  as  a  first 
approximation.  The  gamma  dose  attenuation  length  A  was  set  to  320  metres  for  all 
calculations. 


399 


Table  1:  Comparisons  of  Calculated  EMP  Electric  Fields 


Radius  Total  Field 

Total  Field 

Total  Field 

Total  Field 

0  =  0 

0=0 

n  =  1,0  =  0 

(Wyatt  1980) 

(Grover  1980) 

(Downey  1983) 

(Present  Work) 

(m)  (kV/m) 

(kV/m) 

(kV/m) 

(kV/m) 

500 

390 

45 

23 

304 

900 

1C4 

21 

19 

128 

1300 

114 

15 

13 

62 

One  test  of  any  model  of  EMP  is  whether  it  is  capable  of  producing  fields  of 
sufficient  intensity,  usually  regarded  as  being  in  excess  of  100  kV/m,  to  cause  the 
lightning  observed  during  several  tests.  As  can  be  seen  from  Figs.  2-5,  the  total 
field  reaches  a  maximum  of  ~  300  kV/m  at  .5  km  and  falls  to  less  than  1  kV/m 
at  3.5  km.  It  is  well  known  (e.g.  Hodgdon  1984,  Longmire  and  Gilbert  1980)  that 
the  dominant  field  is  dipolar  because  of  the  cos0  dependence  of  the  current  density. 
Hence,  the  fields  displayed  in  Figs.  2-5  should  constitute  the  greater  part  of  the 
total  electric  field.  It  is  encouraging  that  the  magnitudes  calculated  are  in  excess  of 
those  needed  to  produce  nuclear  lightning  over  much  of  the  range  in  which  they  were 
observed  (900  -  1400  m  from  the  blast)  at  time  scales  of  1  msec  (Wyatt  1980).  The 
values  shown  for  the  fields  in  Figs  2-5  are  of  the  same  order  of  magnitude  as  those 
obtained  for  the  same  conditions  by  Wyatt  (1980),  using  two  separate  conductivity 
models.  Wyatt’s  values  for  the  fields  are  listed  in  Table  1,  and  compared  with  the 
ones  obtained  here,  as  well  as  with  Downey’s  (1983)  and  Grover’s  (1980)  values  for 
the  total  fields.  These  values  are  necessarily  adequate  only  for  order  of  magnitude 
comparisons,  because  of  the  angular  dependence  of  some  of  the  field  values.  As 
well,  it  should  be  emphasized  that  the  values  included  from  Figs.  2-5  of  this  paper 
are  only  the  dipolar  component  of  the  total  field.  Nonetheless,  it  is  evident  that 
there  is  a  significant  difference  among  the  results  obtained  in  the  four  works  cited. 
In  Downey’s  and  Grover’s  cases,  the  results  are  likely  attributable  to  the  different 
conductivity  models  used.  Downey  (1983)  used  detailed  fits  to  the  expected  form 
of  the  conductivity,  taking  into  account  the  air  chemistry,  as  opposed  to  Grover’s 
more  approximate  model.  Even  so,  Downey  only  found  a  variation  of  10%  -  30% 
between  his  values  and  Grover’s.  From  this,  it  seems  likely  that  the  true  form  of 
the  air  conductivity  would  be  quite  important  in  any  calculation  of  the  of  the  fields. 

Figures  6-9  show  the  sextopole  fields  and  potential  (i.  e.  for  n  —  3  in  eq.  (15)). 
As  can  be  seen,  the  fields  are  considerably  smaller  than  for  the  dipolar  field,  but 
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still  significant. 

Figures  10-17  show  the  fields  and  potential  for  the  quadrapolar  fields  (i.  e.  n  =  2 
in  eq.  (15)).  Figures  10-12  show  the  fields  and  potentials  obtained  from  the  solution 
of  eq.  (15):  that  is,  they  show  the  fields  and  potentials  generated  by  the  source 
currents  without  the  addition  of  the  fields  and  potentials  due  to  the  charge  density 
on  the  surface  of  the  Earth.  Figures  13-14  show  the  fields  generated  by  the  surface 
charge  density  created  by  the  n  =  2  fields  for  various  angles  of  interest. 

Three  features  are  of  special  interest  in  these  figures.  The  first  is  that  the 
magnitudes  of  these  fields  are  frequently  of  the  same  order  of  magnitude  as  The  fields^ 
which  induced  the  surface  charge  density.  This  implies  that  the  whole  complex 
of  fields  generated  by  the  even  multipoles  of  the  source  current  can  contribute 
significantly  to  the  total  fields,  acting  at  some  angles  to  increase  the  magnitude  of 
the  fields  and  at  others  to  decrease  them. 

The  other  two  features  of  interest  concern  the  two  types  of  anomalous  behaviour 
of  the  curves  at  low  values  of  r  .  Examples  of  one  type  are  the  jumps  exhibited  near 
1100  metres  and,  less  noticeably,  near  1600  metres,  in  the  graph  of  the  tangential 
field  at  0  =  89°  in  Fig.  14.  These  arise  from  the  accumulation  of  numerical  errors  in 
the  calculation  of  the  series  coefficients  B\  ,  as  discussed  above.  The  jumps  occur  at 
values  of  r  at  which  one  is  able  to  truncate  the  series,  eq.  (37)  ,  at  a  higher  value  of 
/  than  at  the  preceding  value  of  r  ,  and  hence  are  a  representation  of  the  truncation 
error.  Because  the  convergence  of  the  series  is  most  difficult  to  assure  at  low  values 
of  r  ,  this  error  is  most  severe  there.  The  second  type  of  anomalous  behaviour  in 
the  curves  is  the  relatively  abrupt  change  in  the  fields  exhibited  by  the  radial  fields 
for  9  =  0  and  6  =  30°  for  r  <  1500  metres  in  Fig.  13,  and  by  the  tangential  field  for 
8  =  60°  in  Fig.  14.  These  features  do  not  occur  at  places  where  the  number  of  terms 
in  the  series  has  been  increased  and  so  do  not  seem  to  be  due  to  truncation  errors. 
They  may,  however,  be  an  artefact  of  the  models  chosen  for  o  and  J,  .  As  discussed 
above,  the  expressions  for  the  source  current  density  and  the  conductivity  become 
increasingly  inaccurate  near  the  site  of  the  explosion,  and  the  peculiar  behaviour  of 
the  curves  may  be  a  reflection  of  that. 

Figures  15-17  display  the  effects  of  the  potential  and  fields  produced  by  the 
surface  charge  density  on  the  ones  produced  by  the  source  currents.  As  rioted  above, 
the  effects  of  the  surface  charge  are  significant.  Figure  17  is  of  special  interest,  since 
it  displays  the  potential  near  the  Earth’s  surface.  Since  the  boundary  conditions 
require  that  the  total  potential  be  zero  at  the  surface,  one  would  expect  the  two 
potentials  to  counteract  each  other  to  some  degree  near  the  Earth’s  surface  as  they 
in  fact  do.  The  cancellation  would  be  expected  to  be  complete  only  at  the  Earth’s 
surface,  where,  of  course,  the  expression  for  the  field  induced  by  the  surface  charges, 
eq.  (37),  is  not  valid. 

Figures  18-24  show  the  fields  and  potential  obtained  for  the  monopole  portion 
of  the  fields(i.  e.  n  =  0  in  eq.  (15)).  This  component  of  the  field  is  of  interest  both 
because  of  its  relatively  large  magnitude  as  well  as  for  its  peculiar  structure.  Since 
the  potential  generated  by  the  source  current  density  has  no  6  dependence,  the 
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only  tangential  field  present  is  due  entirely  to  the  surface  charge  density.  Figures 
18-19  show  the  radial  field  and  potential  due  to  the  current  density,  and  Figs.  20- 
21  the  fields  created  by  the  surface  charges.  Again,  many  of  the  same  curious 
features  observed  in  the  quadrapolar  fields  and  potential  are  present  here,  and  for 
the  same  reasons.  It  should  also  be  remarked  that  the  convergence  of  the  series  and 
integrals  connected  with  eqs.  (37)  and  (38)  is  much  less  satisfactory  here  than  in 
the  quadrapolar  case.  When  the  base  point  spacing  in  the  numerical  calculation  of 
B\  was  halved  for  n  =  2  ,  the  difference  between  the  two  calculations  of  the  field, 
using  the  two  values  for  each  B\  in  eq.  (37),  was  on  the  order  of  1  volt/metre.  When 
the  same  procedure  was  followed  for  the  n  —  0  case,  the  difference  between  the  two 
calculations  of  the  field  could  be  as  high  as  60  volts/metre,  although  this  lessened 
to  2-3  volts  /metre  at  r  =  5  km.  This  probably  reflects  the  difficulty  of  obtaining 
a  fit  for  the  0  independent  parts  of  the  conductivity  and  source  currents.  Finally, 
Figs.  22-24  show  the  resulting  fields  and  potentials  when  those  due  to  the  source 
currents  and  the  surface  charges  are  combined. 

5  Conclusions 

In  this  work,  it  has  been  demonstrated  that  the  quasi-static  electric  fields  produced 
by  an  explosion  contain  components  that  depend  on  both  the  odd  and  even  surface 
spherical  harmonics,  and  that  this  remains  true  even  if  the  explosion  occurs  near 
a  good  conductor.  In  that  event,  the  even  multipole  fields  induce  a  surface  charge 
density  which  cancels  the  radial  field  at  the  surface  of  the  conductor,  but  which 
leaves  a  non-zero  even  multipole  field  elsewhere  in  space. 

Expressions  for  the  excitation  function  for  the  EMP  in  terms  of  the  surface 
spherical  harmonics  were  obtained,  and  used,  along  with  a  simple  model  of  ionic  and 
electronic  conductivity,  to  obtain  values  for  the  electric  fields  generated  by  a  typical 
explosion.  It  was  found  that  the  dipole  field  dominated,  but  that  the  contribution 
of  the  other  multipole  fields  to  the  total  field  was  significant.  In  particular,  the 
calculated  values  of  the  field  were  sufficient  to  produce  the  lightning  which  has 
been  observed  to  accompany  nuclear  explosions.  This  result  is  in  agreement  with 
the  calculations  performed  by  Wyatt(1980)  but  contradicts  those  done  by  Downey 
(1983)  and  Grover  (1980).  The  difference  is  probably  attributable  to  a  different 
set  of  initial  conditions  and  atmospheric  conductivity  model.  In  passing,  it  should 
also  be  noted  that  the  computational  algorithm  developed  in  this  work  does  not 
require  a  knowledge  of  the  initial  conditions  at  the  blast,  but  only  of  those  at  large 
distances  from  the  explosion. 

Efforts  are  currently  being  made  to  extend  this  work  by  incorporating  the  effects 
of  the  induced  magnetic  fields  and  more  accurate,  self-consistent  models  for  the 
conductivity  and  source  currents. 
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ON  FATIGUE  LIFE  PREDICTION  IN  THICK-WALLED  CYLINDERS 


S.  L.  Pu  and  P.  C.  T.  Chen 
U.S.  Army  Armament,  Munitions,  and  Chemical  Command 
Armament  Research,  Development,  and  Engineering  Center 
Close  Combat  Armaments  Center 
Benet  Weapons  Laboratory 
Watervliet,  NY  12189-4050 


ABSTRACT .  The  large  variation  in  stress  intensity  factors  corresponding 
to  various  material  models  for  a  single,  radial,  straight-fronted  crack  in  a 
pressurized,  partially  autofrettaged  cylinder  leads  to  a  drastic  difference  in 
the  fatigue  life  predictions.  None  of  the  predicted  lives  agree  with  experi¬ 
mental  results.  Possible  explanations  of  the  discrepancy  are  given  and 
corresponding  correction  factors  are  introduced.  The  predicted  lives  based  on 
the  corrected  stress  intensity  ranges  are  reasonably  close  to  a  set  of  well- 
documented  experimental  results  of  Throop  and  Fujczak. 

I.  INTRODUCTION.  Both  finite  element  and  modified  mapping  collocation 
methods  have  been  used  to  obtain  accurate  stress  intensity  (K)  solutions  for 
pressurized  autofrettaged  thick  cylinders  with  radial  cracks  [1,2].  The  use 
of  weight  function  has  extended  the  two-dimensional  K  solutions  to  more 
refined  material  models  including  the  reverse  yielding  caused  by  high 
Bauschinger  effect  [3].  Several  papers  have  used  the  stress  intensity  factors 
to  estimate  fatigue  lives  of  cannon  tubes  [4-6].  The  calculations  underesti¬ 
mate  the  measured  lives  for  pressurized  cylinders,  while  they  overestimate  the 
experimental  results  for  pressurized  and  autofrettaged  cylinders  [5],  The 
disagreement  between  measured  and  calculated  lives  diminishes  in  [6]  by  intro¬ 
ducing  a  fraction  of  the  negative  portion  of  K  values  as  a  part  of  the  K 
range. 

The  shallow  crack  approximations  for  K  solutions  were  used  and  a  linear 
approximation  for  Bauschinger  effect  on  residual  hoop  stress  was  assumed  in 
[5]  and  [6].  The  accurate  two-dimensional  K  solutions  affected  by  a  signifi¬ 
cant  Bauschinger  effect  using  elastic-plastic  analysis  were  used  in  [7]  to 
indicate  the  drastic  effect  of  reverse  yielding  on  the  life  prediction. 

Neither  shape  factors  nor  a  fraction  of  negative  K  were  considered  in  [7]. 

In  this  paper  the  life  prediction  formula  similar  to  that  used  in  [7]  is 
employed  to  check  the  calculated  lives  with  experimental  results  of  Throop  [8] 
and  Throop  and  Fujczak  [9].  The  stress  intensity  factors  for  surface  cracks 
of  elliptical  shape  are  approximated  by  the  two-dimensional  stress  intensity 
factors  obtained  in  [3]  multiplied  by  respective  shape  factors  for  pressure 
and  for  residual  stress  given  in  [5].  The  fraction  of  negative  K  included  in 
K  range  varies  from  one  at  the  notch  boundary  to  zero  at  a  crack  depth  far 
away  from  the  notch.  The  variation  of  this  fraction  is  assumed  to  be  1/r* 
where  r  is  the  distance  between  the  fatigue  crack  front  and  the  notch  front. 
Another  modification  is  to  use  an  initial  crack  depth  much  deeper  than  the 
notch  depth.  This  is  to  avoid  large  cycles  required  in  experiments  to  ini¬ 
tiate  a  single  continuous  crack  front  along  the  notch  boundary  (it  is  con¬ 
sidered  likely  [5]  that  multiple,  small,  semi-elliptical  crackj  are  initiated 


429 


along  the  notch  boundary  prior  to  their  link  together  to  fore  a  single  crack). 
With  these  Modifications  the  calculated  lives  agree  reasonably  Mali  with 
experimental  results  for  all  three  crack  shapes:  long  curves,  semi-elliptical, 
and  semi-circular  used  in  [8]  and  [9]. 

II.  FATIGUE  LIFE  PREDICTION.  The  integration  of  Paris1  formula 

"  “  C(AK)">  (1) 


for  fatigue  growth  rate  of  a  crack  subjected  to  cyclic  loading  is  usually  used 
to  determine  the  number  of  fatigue  cycles  required  to  grow  a  crack  from  an 
initial  depth  a^  to  a  final  depth  af 


N  ■  Nf  - 


,*f  _da 

ai  C(AK)m 


(2) 


where  C  and  m  are  material  constants  and  AK  is  the  range  of  stress  intensity 
defined  by 


*  Kmax  "  Kmin  (3) 

Kmax  and  Kfflin  are  maximum  and  minimum  values  of  K  in  a  loading  cycle.  Assume 
that  a  crack  face  is  a  geometric  plane  and  there  is  no  possibility  of  inter¬ 
penetration  under  compression.  This  leads  to  a  conclusion  that  Kn{n  cannot 
be  a  negative  value.  In  the  case  of  repeated  firing  of  cannon  tubes 

Kmin  *  0  (*) 

is  used  for  both  autofrettaged  and  nonautofrettaged  tubes.  If  Kp  and  Kr 
denote  mode  I  K  values  corresponding  to  an  internal  pressure  and  a  residual 
hoop  stress  respectively,  then 


Kmax  “  Kp 

for  nonautofrettaged  cylinders,  and 


Kmax  *  Kp  +  Kp 


(5a) 


(5b) 


for  autofrettaged  cylinders.  Kp  and  Kr  are  usually  expressed  in  a  dimen¬ 
sionless  form  denoted  by  f*p  and  f|(R,  respectively, 

Kp  Kp 

fKp  ■  *  fKR  *  — ”  (6) 

prsa  c0Vna 

where  o0  is  the  yield  stress,  p  is  the  internal  pressure  applied  to  a  tube, 
and  p  is  related  to  <x0  by  a  load  factor  fL  ■  e0/p.  By  virtue  of  Eqs.  (3), 

(4),  and  (5b) 
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For  a  small  crack  growth  from  aj  to  aj+j,  f|(p  and  f are  assumed  to  be 
constants  which  are  taken  as  the  mean  values 


*Kp  *  *(fKp<aj)  +  fKp(aj+l>) 

^KR  “  ^<fKR(aj)  +  fKR<aj+l)> 
The  integration  of  Eq.  (2)  becomes 


where 
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__  =  -I- - 1  s  l  2-m(--P  +  fKR)",n(Oj1'm/2  -  aj+i1_m/2) 
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(9) 


(10) 


with  t  denoting  the  wall  thickness  and  a  *  a/t.  The  fraction  a  is  used  since 
the  crack  depth  is  usually  expressed  as  a  fraction  of  wall  thickness  t. 


Substituting  from  calculated  values  of  f*p  and  f ^p  [3]  corresponding  to 
various  material  models  in  Eqs.  (8)  and  (9),  we  obtain  the  fatigue  crack 
growth  (a  versus  N/C0)  graph  shown  in  Figure  1  for  a  single,  two-dimensional 
straight-fronted  through  crack  in  a  cylinder  of  diameter  ratio  two.  In  this 
figure,  f  is  the  Bauschinger  effect  factor.  The  dotted  lines  are  for 
idealized  material  without  Bauschinger  effect  (f  *  1),  while  the  dashed  lines 
are  for  f  =  0.38  (100  percent  overstrain)  and  f  *  0.44  (60  percent 
overstrain),  respectively.  The  dashed  lines  with  m'  ■  0  correspond  to 
elastic-perfectly  plastic  behavior  during  reverse  yielding.  The  dashed  lines 
with  m'  *  0.3  indicate  the  difference  in  predicted  cycles  when  strain¬ 
hardening  is  considered  in  reverse  yielding.  The  graph  shows  the  significant 
difference  of  autofrettage  effect  of  various  material  models  on  fatigue 
cycles.  Such  a  drastic  difference  is  not  supported  by  experimental  results. 
Corrective  parameters  must  be  introduced  to  correlate  the  calculated  fatigue 
cycles  with  observed  ones  in  laboratory  tests. 

III.  SHAPE  FACTORS.  The  ratio  of  stress  intensity  factor  for  a  surface 
crack  to  that  of  a  two-dimensional  through  crack  is  called  the  shape  factor. 
The  stress  intensity  factor  varies  along  the  crack  front  of  a  surface  crack. 

It  changes  with  crack  shape  and  under  different  loadings.  If  the  variation  of 
stress  intensity  along  the  crack  front  is  important,  the  three-dimensional  K 
solutions  for  the  surface  crack  should  be  obtained.  If  an  estimate  of  K  at  a 
point,  say  the  deepest  point  of  the  surface  crack,  is  needed,  then  a  reaso¬ 
nable  estimate  of  shape  factor  is  useful.  An  extensive  study  of  shape  factors 
has  been  published  by  Newman  and  Raju  [10]  for  semi-elliptical  cracks  in  a 
flat  plate  under  tension  or  bending.  There  is  no  comparable  study  for  such  a 
crack  in  a  thick  cylinder.  Parker  et  al  obtained  estimates  of  shape  factors 
for  semi-elliptical  cracks  of  various  aspect  ratios  in  a  pressurized  and  auto- 
frettaged  cylinder  [5]  from  judicious  use  of  results  reported  in  [10].  More 
accurate  three-dimensional  K  solutions  should  be  obtained  to  check  these  esti¬ 
mates.  Before  more  accurate  and  reliable  shape  factors  become  available, 
values  close  to  these  given  by  (a)  and  (c)  of  Figure  7  in  [5]  are  used  for  fsp 
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and  fsR#  respectively,  in  this  study.  Multiplying  f|(p  and  f|<R  with  their 
shape  factors,  Eq.  (7)  becomes 


AK  =  «.  fgRfKR)a0/rra 


(11) 


The  following  mean  values  are  used  in  Eq.  (11)  for  a  small  crack  growth  from 
aj  to  aj+j. 


fSpfKp  =  ^[^Sp(aj)^Kp(aj)  +  ^Sp(aj+l)fKp(aj+l)l 


(12) 


fSRfKR  =  3i(fSR(aj)fKR(aj)  +  f  SR  (a  j-H )  ^  KR  ( a  j+1  >  3 


IV.  NEGATIVE  K  FACTOR  Fr.  For  a  fatigue  crack  which  is  considered  as  a 
geometrical  plane  with  no  thickness,  Km^n  =  0  is  used  in  Eq.  (3)  to  obtain  K 
range.  For  a  notch  with  finite  thickness,  the  upper  and  lower  planes  of  the 
notch  may  have  normal  displacement  in  both  directions.  The  argument  used  to 
limit  Km^n  *  o  is  not  valid  at  the  notch  front.  In  fact,  the  full  negative  K 
should  be  used  for  a  crack  starting  at  the  notch.  At  some  depth,  the  notch 
effect  may  become  negligibly  small,  and  *  0  may  again  be  used  in  Eq.  (3) 

for  AK.  Kendall  has  argued  that  a  portion  of  the  negative  K  must  be  included 
in  calculating  K  range  [6].  He  introduced  a  constant  fraction  which  times  the 
negative  K  (Kr)  coresponding  to  compressive  residual  stress  due  to  autofret- 
tage  to  give  the  Kmin  in  AK.  From  our  hypothesis,  the  fraction  varies  with 
the  depth  of  the  fatigue  crack.  As  a  first  approximation,  the  fraction  is 
taken  as  (1  +  r/rn)'z  where  rn  is  the  depth  of  the  notch  and  r  is  the  depth  of 
the  crack  measured  from  the  notch  front.  At  the  notch  front  r  »  0  and  f^  *=  1, 
the  full  negative  K  is  taken  as  Km^n.  When  r  *  rn  the  fatigue  crack  grows  to 
a  depth  equal  to  the  notch  depth;  this  approximation  gives  f^  ■  It  is  also 
assumed  that  f^  *  0  when  r  >  2rn.  A  linear  variation  of  f^  was  another 
approximation  examined,  it  underestimated  the  measured  fatigue  lives. 
Incorporating  fN,  Eq.  (11)  becomes 

AK  =  [fSp  ---  ♦  ( l“fN)fSR^KR)(7o*^a  (13) 


Equation  (9)  can  be  expressed  explicitly 

N  n  fKp<aj)  +  fKp(<*j+l) 

*  I  [ - : . .  +  fKR(«j) 

co  >i  fi 

where  abbreviations  f«p,  f are 


♦  fKR(aj*l)fVj"m/2 


fKp<«j)  *  fSp(«jKKp(<*j) 
f KR<® j )  *  [1  "  fN(“j)]fSR(aj)fKR(«j) 


l-m/2) 
j+1  1 


(14) 


(15) 
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V.  LABORATORY  SPECIMENS  AND  MEASURED  LIVES.  Throop  [8]  and  Throop  and 
Fujczak  [9]  obtained  a  series  of  experimental  results  which  relates  fatigue 
crack  lives  with  crack  shape  and  extent  of  autofrettage  for  thick-wall  cylin¬ 
ders.  The  cylinders  were  0.76  m  (30  inches)  in  length,  180  mm  (7.1  inches) 
bore  diameter,  360  mm  (14.25  inches)  outside  diameter  and  were  fatigue  cracked 
from  longitudinal  internal  notches.  Three  initial  notch  geometries  were  used; 
semi-circular,  100  mm  (4-inch)  and  500  mm  (20-inch)  long  notches  produced  by 
electrical  discharging  machining.  They  were  6.4  mm  (fc-inch)  deep  by  0.76  mm 
(0.03-inch)  wide,  the  semi-circular  notch  being  13  mm  (%-inch)  diameter  half¬ 
penny  shape.  Fatigue  cracks  grown  from  the  initial  notches  were  monitored 
periodically  for  depth  and  shape  with  ultrasonics  as  the  cylinder  was 
repeatedly  pressurized  to  330  MPa  (48  Ksi).  The  cylinder  material  was  ASTM 
A723  forged  steel,  with  yield  strength  of  1175  MPa,  -40#C  Charpy  impact  energy 
of  34  J,  reduction  in  area  of  50  percent.  A  schematic  diagram  of  a  typical 
cylinder  with  a  simple  initial  notch  and  the  growth  of  a  500  mm  (20-inch)  long 
notch  from  6.4  mm  (V-inch)  initial  depth  by  repeated  pressurization  is  shown 
in  Figure  2.  The  measured  crack  depth  versus  corresponding  number  of  fatigue 
loadings  is  shown  in  Figure  3  for  a  single-notched  cylinder  with  0,  30,  and  60 
percent  overstrains  for  three-notch  geometries.  Figures  2  and  3  are  repro¬ 
duced  from  Figures  1  and  6  of  [9],  respectively. 

Since  the  fatigue  crack  is  not  likely  to  start  from  the  notch  imme¬ 
diately,  an  adjustment  of  experimental  results  is  made  by  subtracting  the 
number  of  cycles  required  to  grow  from  initial  notch  depth  to  an  initial  crack 
depth  (a-j)  from  the  number  of  cycles  to  grow  from  initial  notch  depth  to  a 
final  crack  depth  (af).  The  initial  crack  depth,  which  should  be  reasonably 
larger  than  the  initial  notch  depth  (6.4  mm  in  this  experimental  study),  is 
arbitrarily  taken  as  a,-  =  O.lt  =  9  mm.  The  original  experimental  data  are 
available  only  in  graphs.  To  improve  the  accuracy  of  readings  from  graphs, 
the  graphs  were  first  enlarged  and  the  average  was  used  from  values  obtained 
from  different  graphs  published  in  different  papers  [4,5,9]  for  the  same  set 
of  experimental  data.  The  adjusted  experimental  results  are  shown  by  dots  in 
Figures  4,  5,  6,  and  7. 

VI.  PREDICTED  LIVES  FOR  THE  EXPERIMENTS.  The  material  constants  for  the 
steel  used  in  the  experiments  are  m  -  3  and  C  =  6.52xl0",!:  for  crack  growth  in 
meters  per  cycle  and  AK  in  MPa/meter  (or  C  =  3.4xl0"10  for  crack  growth  in 
inches  per  cycle  and  AK  in  Ksi/inch).  Using  f|_  =  3.55  and  values  of  f$p  and 
f$R  in  Table  I,  and  assuming  f^  =  (1  +  r/rn)-2  for  0  <  r  <•  2rn  and  f^  --  0  for 
r  >  2rn,  Eqs.  (14)  and  (15)  can  be  evaluated  with  known  discrete  values  of  f«p 
and  f«R.  Values  of  f$p  and  f$R,  given  in  Table  I,  are  obtained  from  [5]  for 
different  crack  geometries.  Discrete  values  of  f«p  and  f«R  can  be  computed  by 
the  method  described  in  [3].  The  calculated  lives  and  measured  lives  are 
plotted  in  Figures  4,  5,  and  6  for  three  crack  geometries,  respectively.  For 
nonautofrettaged  cylinders  (zero  percent  overstrain),  there  is  only  one  set  of 
calculated  results,  shown  in  a  solid  line,  for  each  crack  configuration.  For 
60  percent  overstrain,  solid  lines  are  for  idealized  material  without  con¬ 
sidering  the  Bauschinger  effect,  while  two  dashed  lines  in  each  figure  are 
predicted  lives  with  some  reverse  yielding.  The  two  dashed  lines  differ  in  m' 
values,  m'  =  0  or  0.3,  where  m'E  is  the  slope  of  strain-hardening  during 
reverse  yielding. 
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When  m'  =  0,  the  material  behaves  like  elastic-perfectly  plastic  in 
reverse  yielding.  The  magnitude  of  the  compressive,  residual  stress  in  the 
tangential  direction  near  the  bore  is  smaller  than  that  in  a  strain-hardening 
material  [3].  The  strain-hardening  reduces  the  adverse  effect  of  reverse 
yielding  due  to  the  Bauschinger  effect,  Figure  1.  Figures  4  through  6  show  an 
overall  effect  of  all  factors.  Figure  7  shows  the  effect  of  each  factor  on 
the  semi-elliptical  surface  crack  in  a  pressurized  cylinder  with  60  percent 
overstrain.  The  dotted  curve,  curve  1,  gives  the  calculated  lives  for  an 
idealized  material  with  no  Bauschinger  effect.  No  correction  factors  are 
used.  The  stress  intensity  range  is  based  on  two-dimensional  stress  intensity 
factors  and  Km-jn  =  0.  If  the  change  in  residual  hoop  stress  due  to  the 
Bauschinger  effect  (Bauschinger  effect  factor  f  =  0.44)  is  considered,  the 
calculated  lives  are  shown  as  a  dashed  curve,  curve  2,  where  m’  =  0.3  is  used. 
The  large  difference  between  curves  1  and  2  shows  the  significant  effect  of 
reverse  yielding  on  fatigue  lives.  If  shape  fa 'tors  fgp  and  fgR  are  con¬ 
sidered  in  addition  to  the  Bauschinfer  effect  tne  predicted  lives  will  be 
changed  from  curve  ?  to  curve  3.  The  shape  factors  are  to  increase  the  fati¬ 
gue  lives.  Finally,  if  the  correction  factor  fN  is  also  taken  into  con¬ 
sideration,  the  predicted  lives  are  shown  as  the  solid  line,  curve  4,  which  is 
close  to  the  experimental  results  shown  by  dots  in  Figure  7.  Similar  graphs, 
obtained  for  the  long  curved  crack  and  the  semi-circular  crack,  are  omitted 
since  they  indicate  similar  effects  of  each  correction  factor. 


TABLE  I.  VALUES  OF  fSp  AND  fSR  FOR  DIFFERENT  CRACK  GEOMETRIES 


a  =  a/t 

20- 

Lonq 

Inch 

Notch 

4 

Lon 

-Inch 
q  Notch 

Semi-Circular 

Notch 

fSp 

fSR 

fsP 

fSR 

fSp 

fSR 

0.1 

0.95 

0.95 

0.72 

0.69 

0.57 

0.58 

0.2 

0.90 

0.90 

0.64 

0.60 

0.53 

0.52 

0.3 

0.85 

0.85 

0.61 

0.55 

0.51 

0.42 

0.4 

0.85 

0.80 

0.57 

0.47 

0.48 

0.34 

0.5 

CO, 

j 

0.75 

0.56 

0.38 

0.44 

0.24 

VII.  CONCLUSIONS.  The  fatigue  life  of  a  thick-walled  cylinder  can  be 
predicted  reasonably  well  by  the  integration  of  Paris'  formula  of  crack  growth 
rate  of  a  fatigue  crack  under  cyclic  loading,  if  a  modified  stress  intensity 
range  is  used.  The  stress  intensity  range  is  obtained  by  multiplying  two- 
dimensional  stress  intensity  factors  by  proper  shape  factors  and  some  negative 
K  factors.  More  systematic  and  controlled  experiments  are  required  to  check 
the  idea  proposed  in  this  paper.  Three-dimensional  K  solutions  for  semi- 
ellipt  ical  surface  cracks  in  thick  cylinders  with  various  residual  stresses 
are  needed  to  estimate  the  proper  shape  factors.  Special  experiments  should 
be  performed  to  verify  the  concept  of  negative  K  factors  and  to  determine  the 
variation  of  f^  in  terms  of  crack  depth. 
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ABSTRACT.  A  thin  composite  shrink  fit  assembly  is  examined  using  an 
elastic-plastic  analysis.  The  ring  and  disk  are  made  of  different  materials. 
Interferences  large  enough  to  induce  plastic  deformations  in  the  ring  are 
accounted  for.  The  ring  material  is  assumed  to  be  a  linear  strain-hardening 
material  that  obeys  Tresca's  yield  condition.  The  explicit  expressions  for 
stresses  and  deformations  in  the  shrink  fit  assembly  have  been  obtained. 
Numerical  results  are  presented  for  shrink  fit  assemblies  with  different 
geometric  ratio,  hardening  parameter,  and  different  combinations  of  materials. 

I.  INTRODUCTION.  The  shrink  fit  fastening  process  is  widely  used  in 
industry  to  produce  tight,  precision  assemblies  where  other  fastening  methods 
are  neither  necessary  nor  practical.  By  shrinking  a  thin  ring  onto  a  disk  of 
the  same  thickness,  an  elastic  state  of  biaxial,  hydrostatic  stress  can  be 
induced  in  the  disk.  For  sufficiently  small  values  of  interference  of  the  fit, 
the  ring  and  disk  remain  elastic;  for  large  values  of  interference,  the  ring 
becomes  plastic,  first  at  the  interference;  for  yet  larger  values  of  inter¬ 
ference,  it  is  possible  to  produce  a  plastic  state  in  the  disk.  This  problem 
was  analyzed  recently  by  Gamer  and  Lance  [1]  considering  the  same  materials 
for  the  disk  and  ring. 

In  this  paper  we  shall  examine  a  thin  composite  shrink  fit  assembly  using 
a  plane-stress  elastic-plastic  analysis.  The  ring  and  disk  are  made  of  dif¬ 
ferent  materials.  Interferences  large  enough  to  induce  plastic  deformations 
in  the  ring  are  accounted  for.  The  ring  material  is  assumed  to  be  a  linear 
strain-hardening  material  that  obeys  Tresca's  yield  condition  and  the  asso¬ 
ciated  flow  rule.  The  stresses  and  deformations  in  the  shrink  fit  assembly 
are  to  be  obtained  as  functions  of  the  interference  of  the  fit. 

II.  ELASTIC  ASSEMBLY.  A  shrink  fit  assembly  is  shown  in  Figure  1.  The 
assembly  may  be  produced  by  cooling  the  disk  and/or  heating  the  ring  with  the 
manufactured  interference  I.  The  common  interference  radius  of  the  assembly 
is  a.  The  thickness,  h,  is  small  compared  to  a,  and  hence,  the  state  of 
stress  may  be  assumed  to  be  plane.  All  thermal  effects  are  neglected  and  the 
displacement  is  assumed  to  be  small  everywhere. 

For  small  values  of  interference  of  fit,  the  stress  state  in  the  entire 
assembly  is  elastic.  The  stresses  and  displacements  in  the  ring  are 

p  a8  a2  (la) 

- [--  T  "] 

1  -  a*/b2  b*  r2  (lb) 


°r 
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u/r  =  (P/E)[(l+v)  (a*/r2)  +  (l-u)(aVb*)]/(l-aVb‘)  (lc) 

and  in  the  disk 

ar  -  oq  =  -P  ,  u/r  =  -(l-ui)P/Ei  (2) 

where  E,  v  and  Ej,  are  the  material  constants  of  the  ring  and  disk,  respec¬ 
tively.  At  the  interface,  ua  (ring)  -  ua  (disk)  =  I  by  the  compatibility 
requirement.  The  interference  pressure  (p)  is  a  function  of  the  interference 
(I)  given  by 

El  a2  a*  a*  E 

p  =  --  (1  -  --)/[il+u)  ♦  (1-P)  "  ♦  (l-ui)(l  -  --)  --]  (3) 

a  b2  b*  b*  Ej 

For  sufficiently  large  values  of  the  interference  the  stresses  in  the 
ring  reach  the  yield  limit.  Assuming  that  Trti,ca's  yield  condition  governs 
the  behavior  of  the  material,  the  ring  first  becomes  plastic  at  the  inter¬ 
ference  when  the  stresses  satisfy 

ae  -  or  *  (4) 

where  oQ  is  the  initial  tensile  yield  stress.  The  solution  for  the  critical 
interference  pressure  to  cause  incipient  plastic  deformation  is 

P*  =  *  o0(l  -  a*/b* )  (5) 

and  it  follows  from  Eq.  (3)  that  the  interference  for  the  onset  of  plastic 
flow  is 

ao  a  a*  a*  E 

i*  =  ;  C(i*»)  ♦  (1-p)  --  ♦  (1-pi)(1  -  rr)  ~]  (6) 

E  2  bz  b*  Ej 

which  reduces  to  I*  =  a<r0/E  for  the  special  case  (Ej  *  E,  V\  *  v)  considered 
in  [1]. 

III.  PARTIALLY  PLASTIC  ASSEMBLY.  For  values  of  interference  larger  than 
that  given  by  Eq.  (4),  a  plastic  zone  forms  in  the  ring,  so  that  for  a  4  r  4  p 

the  ring  is  plastic,  while  for  p  4  r  4  b,  the  ring  material  is  still  in  an 

elastic  state.  The  elastic-plastic  interface  radius  p  is  a  function  of  the 
interference  I. 

We  assume  that  the  ring  is  made  of  a  linear  work-hardening  material  which 
obeys  Tresca's  yield  condition 

00  -  or  ■  a  (7) 

where  the  yield  stress  a  is  a  function  of  the  plastic  strain  e*5.  For  a  linear 
work-hardening  material,  we  have 

v  *  <y0U+nep)  and  r?  =  (E/o0)m/(l-m)  (8) 

where  n  (or  m)  is  the  hardening  parameter. 
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Applying  the  usual  flow  rule  and  following  the  method  of  analysis 
reported  by  Gamer  and  Lance  [1]  and  Bland  [2],  the  expressions  for  the 
stresses  and  the  displacement  can  be  obtained  explicitly.  The  complete  solu¬ 
tion  in  a  <  r  <  p  is: 


r 

.  0 

■  Hd-vjn  -  r"2]  +  c 

Or  =  cr0(l-m)  [in  -  - 
a 

(9) 

r 

D 

oq  =  a0(l-m) [1+In  - 

a 

+  *(1-P)n  -  r"2]  +  c 

(10) 

ao  r 

D  CO 

i  -  —  ( 1-m) [rln  -  - 

J*(l-v)n  -  r"1]  +  -  r  +  -  r_1 

(ID 

E  a  E  E  E 


In  the  elastic  zone,  p  <  r  <  b,  the  stresses  and  the  displacement  are: 

ar  E  E  B  (12) 

=  - A  + - 

ae  r*  (13) 

u  =  Ar  +  B/r  (14) 


The  constants  A,  B,  C,  0,  p,  and  p  all  depend  on  the  interference  I,  and 
can  be  evaluated  by  considering  the  following  conditions:  continuity  of  stress 
and  displacement  at  r  =  p  requires  ar(p")  =  <rr(p+)  and  u(p“)  =  u(p+).  At  the 
ring-disk  interface  crr ( a )  =  -p  and  at  the  outer  surface  of  the  ring  or(b)  *  0. 
The  yield  condition  in  Eq.  (7)  must  be  satisfied  at  r  *  p  and  finally,  com¬ 
patibility  of  the  displacement  field  with  the  interference  I  requires  that 
u(a+)  -  u(a~)  1*  I.  These  conditions  are  sufficient  to  determine  all  unknown 
parameters.  In  this  paper  the  constants  A,  B,  C,  D  are  determined  as  func¬ 
tions  of  p. 

A  =  H(l-v)(o0/E)(p/b)«  ,  B  =  *(l+p)(o0/E)p2 

C  =  <T0[^m  -  (l-m)fn(b/a)  -  *(l-p*/b2)]  ,  D  =  o0p2/(l-i>)  (15) 

the  dimensionless  interference  pressure  and  interference  are  given,  respec¬ 
tively,  by 


P  =  P /a0  =  Ml-pVb*)  +  (l-m)in(p/a)  +  *m(p2/a2-l )  (16) 

I  =  (E/c0)l/a  =  (p/a) 2  -  [(1-v)  -  (l-v^E/EjHP/Oo)  (17) 

When  the  ring  and  disk  are  made  of  the  same  material,  i.e.,  Ej  =  E,  V\  *  v, 

Eq.  (17)  reduces  to  the  simple  formula,  (E/a0)I/a  =  (p/a)2.  For  this  special 
case  [1],  the  constants  A,  B,  C,  0,  P,  and  p  can  be  expressed  explicitly  as 
functions  of  interference  I.  In  general,  the  interference  pressure  (p)  is 
related  to  the  interference  (I)  implicitly  through  the  elastic-plastic  inter¬ 
face  (p)  as  shown  in  Eqs.  (16)  and  (17)  for  a  <  p  4  b.  The  upper  limit  of  the 
partially  plastic  assembly  is  obtained  by  letting  p  =  b.  The  corresponding 
interference  pressure  (p**)  and  interference  (I**)  are 
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P  **/o0  =  (l-m)In(b/a)  +  Jsm(b2/a2-l) 

(E/a0) I**/a  =  (b/a)2  -  [(1-u)  -  (l-y^E/E^p **/oQ 


(18) 


IV.  FULLY  PLASTIC  ASSEMBLY.  When  the  interference  I  is  larger  than  I**, 
we  have  reached  the  fully  plastic  state  in  the  ring.  In  this  case,  the 
expressions  for  the  stresses  and  the  displacement  in  a  <  r  $  b  are  still  the 
same  as  those  given  by  Eqs.  (9),  (10),  and  (11).  The  constants  C,  D  and  the 
interference  p  are  determined  with  the  boundary  conditions  cxr(a)  =  -p,  or(b)  ■ 
0,  and  the  compatibility  requirement  u(a+)  -  u(a")  =  I.  The  results  for  the 
constants  are 


C  =  [pa2/b2  -  (l-m)o0  in(b/a)]/(l-a2/b2 ) 

0  =  2a2  [p  -  ( l-m)cr0fn(b/a)  ]/[m(l-v)  (l-a2/b2  )  ]  (19) 

and  the  interference  pressure  is  given  as  a  function  of  interference  by 


P  m(Ecr0/Ia)  ( l-a2/bz )  +  2(l-m)ln(b/a) 
o0  ~  2  -  m[ ( 1-v)  -  (l-u1)E/E1](l-a2/b2) 


(20) 


V.  NUMERICAL  RESULTS  AND  0ISCUSSI0NS.  The  analysis  described  above 
makes  it  possible  to  predict  the  interference  pressure  in  a  composite  shrink 
fit  assembly,  and  hence,  determine  the  stress  state  in  the  ring  and  disk  as  a 
function  of  the  interference.  The  numerical  results  have  been  obtained  for 
shrink  fit  assemblies  with  different  geometric  ratio  (a  =  a/b),  hardening 
parameter  (m),  and  different  combinations  of  materials.  For  a  steel  ring  with 
a  =  0.5,  m  =  0.0,  E  =  30x10s  psi,  v  =  0.3,  aQ  =  15x10*  psi,  we  have  considered 
three  types  of  disks:  (a)  rigid  disk  with  Ej  =  1000  E,  =  0.0,  oj  =  1000  o0; 
(b)  steel  disk  of  the  same  material  as  the  ring;  (c)  a  disk  made  of  tungsten 
carbide  with  Ej  =  88.5x10s  psi,  =  0.258,  oj  =  50x10*  psi.  The  numerical 
results  of  tne  interference  pressure  (p/cr0)  for  these  three  cases_are  pre¬ 
sented  graphically  in  Figure  2  as  functions  of  the  interference  (I).  The 
results  of  the  hoop  stress  at  the  inside  surface  of  the  ring  are  presented  in 
Figure  3  also  for  these  three  cases.  As  can  be  seen  from  these  two  figures, 
the  results  for  the  composite  shrink  fit  assembly  (c)  falls  between  the  two 
limits  established  by  cases  (a)  and  (b). 

For  composite  shrink  fit  assemblies  made  of  tungsten  carbide  disk  and 
steel  ring  with  a  =  C.5,  m  =  0.0,  0.1,  0.2,  the  results  are  presented  in 
Figures  4  and  5,  respectively,  for  the  interference  pressure  and  the  hoop 
stress  at  the  bore  as  functions  of  the  interference.  The  effect  of  hardening 
parameter  (m)  on  these  relations  can  be  seen  from  these  two  figures.  For  the 
same  combination  of  composite  shrink  fit  assembly  with  m  *  0.05,  o  »  1/4,  1/3, 
1/2,  3/4,  the  results  showing  the  effect  of  geometric  ratio  (a)  are  shown  in 
Figures  6  and  7  for  the  interference  pressure  and  hoop  stress  at  the  bore, 
respectively. 
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-?UmeriC^  .reSu  ts  of  the  stresses  *nd  displacements  in  composite 
snrink  fit  assemblies  have  also  been  obtained,  but  only  some  results  are  pre¬ 
sented  here.  The  distributions  of  hoop  stresses  in  a  steel  ring  with  a  =  0.5 
are  shown  in  Figures  8,  9,  and  10  for  m  *  0.0,  0.1,  0.2,  respectively.  In 
each  figure,  we  have  shown  the  results  corresponding  to  four  stages  of  inter- 
/!/!nC6i  4nitia1  yitldin0  <P/a  "  1-0)*  i*  -  0.832;  (b)  partial  yielding 

I'll*  V.J;97?:.1;9!?'  1,950;  (c)  complete_yielding  (p/a  «  2.0),  I**  = 
3.689,  3.653,  3.617;  (d)  fully  plastic  state  with  I  *  1.5  I**.  For  an  ideally 
plastic  ring  (m  *  0.0),  the  stress  distribution  remains  unchanged  after 
complete  yielding  has  been  reached.  For  strain-hardening  rings,  the  stress 
distributions  show  large  variations,  especially  for  large  values  of  inter- 
ference.  As  shown  in  Figures  8,  9,  and  10,  the  hardening  parameter  has  a 
significant  effect  on  the  stress  distributions.  Additional  stress  distribu- 
tions  in  the  ring  with  m  *  0.1  are  shown  in  Figures  11  and  12  for  a  =  1/3  and 
1/4,  respectively.  The  effect  of  geometric  ratio  on  the  distributions  can  be 
seen  by  comparing  Figures  9,  ll,  and  12. 
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Fig.  1  Shrink  fit  assembly 
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Figure  2.  Interference  pressure  versus  interference  for  three  shrink  fit 
assemblies  (a  =  0.5,  m  =  0.0). 


Figure  3.  Hoop  stress  at  the  bore  versus  interference  for  three  shrink 
fit  assemblies  (o  *  0.5,  m  «  0.0). 
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Figure  4.  The  effect  of  hardening  on  Interference  pressure  In  a  composite 
assembly  (a  *  0.5) . 
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Figure  5.  The  effect  of  hardening  on  the  hoop  stress  at  the  bore  of  a  steel 
ring  (a  *  0.5) . 
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Figure  6. 


The  effect  of  geometric  ratio  on 
composite  assembly  (m  =  0.05). 
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A  SHALLOWLY  CURVED  SHEAR -DEFORMABLE  BEAM  ELEMENT 
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ABSTRACT.  Shallowly  curved  beam  elements,  including  shear  deforma¬ 
tion  and  rotary  inertia  effects,  are  derived  from  Hamilton's  variational 
principle.  Different  degree  polynomials,  labeled  ' an isoparametric ' ,  are 
used  to  interpolate  the  kinematic  variables,  instead  of  uniform  interpo¬ 
lations  as  in  the  conventional  isoparametric  procedure.  This  approach 
yields  a  correct  representation  of  the  bending  strain  and,  importantly, 
the  membrane  and  transverse  shear  strains.  Consequently,  the  severe 
shortcomings  of  the  exactly  integrated  isoparametric  elements,  charac¬ 
terized  by  excessively  stiff  solutions  in  the  thin  regime  (a  phenomenon 
often  referred  to  as  membrane  and  shear  locking),  are  overcome.  Uniform 
(isoparametric-like)  nodal  patterns  are  achieved  by  explicitly  enforcing 
higher-degree  penalty  modes  in  the  membrane  and  shear  strains.  This 
procedure  preserves  the  compatibility  of  the  kinematic  field  and  the  ca¬ 
pability  of  the  element  to  move  rigidly  without  straining.  Exact 
quadratures  are  used  on  all  element  matrices,  producing  a  correct  rank 
stiffness  matrix,  a  consistent  load  vector,  and  a  consistent  mass  ma¬ 
trix.  The  elements  suffer  no  limitations  over  the  entire  theoretical 
range  of  the  slenderness  ratio.  For  further  enhancement  and,  particu¬ 
larly,  in  coarse-mesh  situations,  an  effective  relaxation  of  penalty 
constraints  at  the  local  element  level  is  introduced.  This  technique 
ensures  a  well-conditioned  stiffness  matrix.  Although  the  element  pen¬ 
alty  constraints  are  relaxed,  the  corresponding  global  structure  con¬ 
straints  are  enforced  as  is  required  by  the  analytic  theory.  Particular 
attention  is  given  to  the  simplest  element  --  a  two-node,  six 
degrea-of-freedom  beam  in  which  all  strains  are  constant  .  Solutions  to 
static  and  free  vibration  arch  and  ring  problems  are  presented,  demon¬ 
strating  the  exceptional  modeling  capabilities  of  this  element. 

I .  INTRODUCTION .  Early  formulations  for  curved  beam  models  em¬ 
ployed  the  assumptions  of  Bernoulli-Euler  theory  and  often  produced 
elements  that  lacked  rigid  body  motion  and  af  the  same  time  exhibited 
severe  stiffening  in  approximating  the  behavior  of  thin  and  deep  arches 
[1-4].  These  classical  elements  require  Cn  and  C1  continuity  for  the 
membrane  and  transverse  displacements,  respectively.  Using  mixed 
polynomial-trigonometric  shape  functions,  and  at  the  expense  of  higher 
continuity  enforcement  [5-8],  proper  rigid-body  motion  was  obtained, 
though  the  full  extension  into  the  thin  regime  was  not  attained.  Fried 
[9-10]  identified  the  source  of  the  thin-regime  difficulty  as  one  of 
imbalance  between  discretization  and  inextensibility  errors,  and  he  em¬ 
ployed  his  'residual'  energy  balancing  technique  to  arrive  at  well-be¬ 
haved  thin-arch  and  cylindrical  shell  elements.  Meek  [11]  showed  that 
even  without  proper  rigid-body  motion,  effective  thin-arch  elements 
could  be  produced  once  'consistent'  displacement  polynomials  were  used. 
He  interpolated  the  membrane  displacement  with  a  polynomial  one  degree 
higher  than  that  used  for  the  transverse  displacement  and  achieved 

455 


physically  meaningful  membrane -bending  coupling  in  the  membrane  strain. 
Mixed  and  selective/ reduced  integration  approaches  based  upon  iso¬ 
parametric  interpolations  [12-16]  produced  several  well-behaved  ele¬ 
ments.  In  certain  cases  of  reduced  integration,  however,  spurious 
zero-energy  modes  were  introduced,  or  the  required  membrane -bending 
coupling  was  violated  [14]. 

The  cause  of  thin-regime  stiffening  in  the  classical  formulations 
(i.e.,  'membrane  locking1)  can  be  traced  to  the  penalty-type  membrane 
strain  energy  which  involves  membrane -bending  coupling.  Extensional  de¬ 
formations  in  thick  beams  and  inextensibility  of  thin  beams  are  both 
governed  by  a  penalty  parameter  which  becomes  large  in  the  thin  regime. 
For  very  slender  elements,  membrane  and  transverse  displacements,  origi¬ 
nally  assumed  independent,  become  interdependent  because  of  coupling  in 
the  membrane  strain  and  the  vanishing  strain  requirement  imposed  by  a 
large  penalty  parameter.  When  a  membrane- strain  state  lacks  the  theo¬ 
retically  required  membrane -bending  coupling,  commonly  the  result  of  in¬ 
terpolation  inconsistency  (e.g.,  when  isoparametric  interpolations  are 
used),  spurious  (nonphysical)  constraining  of  a  single  kinematic  field 
takes  place.  This  in  turn  yields  severe  constraining  of  the  bending 
strain,  giving  rise  to  excessively  stiff  deformations. 

The  behavior  of  shear-deformable  (C°)  straight  beam  and  flat 
plate/shell  elements  is  also  governed  by  a  penalty  mechanism  from  the 
vanishing-strain  enforcement.  There,  the  penalized  shear  strain  energy 
contains  a  deflection-rotation  coupling  in  the  shear  strains.  For 
straight  Timoshenko  beams,  Tessler  and  Dong  [17]  demonstrated  a  varia- 
tionally  consistent  approach  which  yielded  simple  displacement  elements 
devoid  of  any  slenderness  related  deficiency.  Their  kinematic  interpo¬ 
lations  ensured  physically  admissible  coupling  in  all  shear-strain 
states,  analogous  to  Meek's  [11]  membrane -bending  coupling  in  thin-arch 
elements.  By  adopting  the  deflection  polynomial  one  degree  higher  than 
that  for  the  normal  rotation  [17],  uniform  nodal  patterns  were  produced 
by  explicitly  enforcing  the  higher-degree  Kirchhoff  (zero-shear)  modes. 
The  Tessler-Dong  elements  possess  the  same  stiffness  matrices  and,  in 
fact,  use  the  same  integration  rule  as  the  corresponding  selective/- 
reduced  integration  elements  [18];  however,  in  their  elements  [17]  the 
integrations  are  exact.  Further,  in  the  cases  of  distributed  static  and 
inertial  loadings,  these  elements  perform  significantly  better  than 
their  reduced  integration  counterparts.  This  is  because  consistent  load 
vectors  and  mass  matrices  evolve  naturally  in  the  Tessler-Dong  formula¬ 
tion,  but  are  not  available  within  the  reduced  integration  procedures. 
(Referring  to  exactly  integrated  load  vectors  and  mass  matrices  as 
'consistent'  for  underintegrated  stiffnesses,  as  is  often  done  [15,16], 
is  in  fact  erroneous;  refer  to  the  discussion  in  [17].) 

Recent  extensions  of  the  one-dimensional  interpolation  strategy 
[17]  to  axisymmetric  shell  and  plate  elesients  were  introduced  by  Tessler 
[19-21]  and  Tessler  and  Hughes  [22-23],  where  to  maintain  variational 
consistency  and  kinematic  reliability  (i.e.,  correct  rank  of  a  stiffness 
matrix)  full  Gaussian  quadratures  were  employed  on  all  energy  and  work 
terms.  It  was  pointed  out  [22]  that  in  the  case  of  a  quadrilateral 
element,  it  is  not  always  possible  to  achieve  proper 
<IH  lect ion-rotat  ion  coupling  in  all  shear-strain  states.  Further, 
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kinematic  restraints,  such  as  boundary  conditions,  can  readily  uncouple 
a  consistently  coupled  shear-strain  mode  (e.g.,  by  removing  deflection 
degrees  of  freedom),  causing  the  elements  to  lock  [23].  Thus,  inter¬ 
polation  measures  alone  were  insufficient  to  ensure  proper  thin-regime 
behavior. 

The  fundamental  difficulty  stems  from  the  direct  adoption  of  ana¬ 
lytic  theory  to  finite  elements,  which  results  in  the  strict  enforcement 
of  vanishing  penalty  strains  at  the  element  level.  The  fact  that  van¬ 
ishing  shear  and  extensional  deformations  pertain  only  at  the  global 
structure  level  has  largely  been  overlooked.  Because  of  the  limited 
kinematic  freedoms  afforded  by  element  interpolations,  it  would  seem  on¬ 
ly  natural  to  depart  from  the  conventional  approach  and  adopt  what  we 
shall  refer  to  as  the  local  penalty-relaxation  method.  The  idea  is  to 
'relax'  the  enforcement  of  element  penalty  strains,  yet  maintain  the 
global  penalty  constraints  of  the  discretized  structure.  The  actual 
implementation  is  via  correction  parameters  for  the  penalty  stress 
resultants  (i.e.,  membrane  and  shear  forces),  which  also  appear  in  the 
element  penalty  parameters.  Although  implemented  somewhat  differently, 
the  techniques  of  Fried  [9-10],  MacNeal  [24],  and  Tessler  and  Hughes 
[22-23]  are  closely  related  and  can  be  characterized  as  local 
penalty-relaxation  methods.  The  principal  benefits  of  these  procedures 
are  the  removal  of  the  locking  deficiency,  the  improved  condition  of  the 
stiffness  matrix,  and  the  improved  solutions  in  coarse-mesh  situations. 

The  objective  of  the  present  paper  is  to  extend  the  ideas  explored 
in  [11,17,19-23]  to  shallowly  curved,  shear-deformable  beam  elements. 
The  reason  for  adopting  a  shallow  element  geometry  is  to  effectively 
model  shallow  as  well  as  deeply  curved  beams  [7],  The  inclusion  of 
shear  deformation  and,  in  dynamics,  rotary  inertia  extends  the  range  of 
applicability  to  moderately  thick  regimes.  The  principal  goal  is  to 
achieve  a  reliable  and  effective  element  behavior  that  is  not  subject  to 
any  thinness  restrictions,  so  that  the  basic  methodology  can  be  applied 
to  general  curved  shell  models. 

To  ensure  effective  element  behavior  in  the  limiting  thin  mam- 
brane/bending  regimes,  consistent  interpolations  (labeled  'anisoparam- 
etric'  [23])  and  suitable  corrections  (relaxations)  of  the 
element-level  membrane  and  shear  penalty  modes  are  incorporated.  From  a 
hierarchy  of  these  anisoparametric  elements,  we  examine  the  simplest, 
two-node  (six  degrees  of  freedom)  element,  in  which  all  strain  compo¬ 
nents  are  constant  along  its  span.  The  element  is  C°  compatible,  devoid 
of  locking  even  in  the  extremely  thin  regime,  and  it  can  move  rigidly 
without  straining.  Its  stiffness  matrix  is  well-conditioned  over  the 
entire  range  of  the  element  slenderness  ratio,  which  implies  ideal 
suitability  for  applications  on  microcomputers  with  a  short  word  length. 

In  Section  2,  the  structural  theory  of  shallowly  curved  beams  [25] 
is  briefly  described.  In  Section  3,  a  variational  implementation  of  the 
theory  with  anisoparametric  displacement  fields  is  discussed  for  a 
hierarchy  of  elements,  whereas  in  Section  4  a  two-noded  element  is 
derived.  Implementation  of  the  local  penalty  relaxation  concept  is 
addressed  in  Section  5.  Finally,  solutions  to  static  and  vibration  test 
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problems  are  presented  in  Section  6,  and  conclusions  are  summarized  in 
Section  7. 


II.  ANALYTIC  THEORY.  To  establish  a  conceptual  basis,  the 
Timoshenko/Marguerre  shallow  beam  equations  [25-26]  are  discussed  within 
the  framework  of  linearly  elastic,  planar  deformations. 


Let  the  curved  beam  of  a  uniform  rectangular  cross-section  be  lo¬ 
cated  in  the  x-z  plane  (which  is  the  plane  of  cross-sectional  symmetry) 
where  the  x-axis  is  coincident  with  the  beam  chord  (refer  to  Figure  1). 
The  middle-surface  membrane  displacement,  u(x,t)  (henceforth,  t  denotes 
time),  transverse  displacement,  w(x,t),  and  normal  cross-sectional  rota¬ 
tion,  0(x,t),  completely  describe  the  planar  membrane,  bending,  and 
transverse  shear  deformations;  with  the  strains  given  by: 


e 


+  e  =  u, 
w 


wT,  w, 

I  X  X 


(2.1) 


-  -  -e,x  (2.2) 

Y  =  Y„  +  le  =  w,x  -  0,  (2.3) 

where  w^  describes  the  shallow  shape  of  the  beam  (vn.,  2  «1).  Evident¬ 
ly,  for  a  straight  beam  e  =0,  and  all  strains  (2.1;- (^.3)  are  those  of 
Timoshenko  theory  [26]. 


The  kinetic  variables  of  the  theory  are  the  membrane  force  resul¬ 
tant,  N(x,t),  the  bending  moment,  M(x,t),  and  the  transverse  shear 
force,  Q(x,t).  These  are  related  to  the  corresponding  strains  by  the 
constitutive  relations: 


N  =  D  e  «  AEe,  M  =  =  EIk,  Q  «  Dgy  =  k2GAy,  (2.4) 

with  A,  I,  E,  G,  and  k2  denoting  the  cross-sectional  area,  the  moment  of 
inertia,  the  elastic  modulus,  the  shear  modulus,  and  the  shear  correc¬ 
tion  factor,  respectively. 

To  derive  the  equations  of  motion,  Hamilton's  variational  statement 
is  invoked,  i.e., 

t  t,  2, 

fij  Ldt  *=  6j  (ij  [  pA(u2  +  w2)  +  pI02]dx 
to  t0 


-  ij  [Ne  +  Mk  +  Qy]dx 


+  j  [wq  +  0m]dx  }dt  =  0  (2.5) 

where  a  superior  dot  denotes  differentiation  with  respect  to  t,  p  is 
the  mass  density,  l  is  the  chord  length,  and  q  and  m  are  the  distributed 
transverse  force  and  moment  loadings,  respectively.  Conveniently,  due 
to  the  shallowness  assumption,  the  energy  integrals  are  taken  over  a 
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straight  chord  rather  than  a  curved  beam  description.  (This  aspect  lends 
to  the  desired  simplicity  in  formulating  shell  elements.) 


III.  KINEMATIC  CONSIDERATIONS.  Of  special  significance  in  the 
present  theory  is  a  double-penalty  form  of  the  elastic  strain  energy, 
which  can  be  expressed  as 


U  =  (U.+  a  U  +  a  U  ), 

21  b  mm  s  s 


(3.1) 


where  the  nondimens ional  bending,  membrane  and  shear  energy  contribu¬ 
tions  are  respectively 


£  £  £ 

V  tJ/’dx’  V  1  J0cMx’  V  1  j/dx- 


(3.2) 


and  the  penalty  parameters  have  the  form 

a  =  U/r)2,  a  -  *k2(G/E)(t/r)2,  (3.3) 

m  s 

with  r»/l/A  denoting  the  cross-sectional  radius  of  gyration. 

For  a  very  slender  beam  a  ,a  -►  ~.  These  parameters  enforce  thin 
limits  of  the  membrane  inextensifility  and  shear less  (Kirchhoff)  re¬ 
gimes: 

Inextensibility  constraint:  e  *  eu  +  ■*  0  (3.4) 

Kirchhoff  constraint:  Y=Y+Y„->-0  (3.5) 

""  w  o 

The  conventional  approach  of  directly  applying  the  theory  to  finite 
elements  treats  the  displacement  variables  and  the  material  and  geo¬ 
metric  quantities  as  element  properties.  This  means  that  the  penalty 
constraints  (3.4)  and  (3.5)  are  enforced  at  the  element  level. 

Since  the  highest  spatial  derivative  in  (2.5)  is  of  order  one,  the 
displacement  variables  require  only  C°  continuity.  This  implies  a  wide 
range  of  interpolating  possibilities.  On  the  other  hand,  penalty  con¬ 
straints  (3.4)  and  (3.5)  limit  the  choice  of  interpolations  to  those 
yielding  identical  polynomial  descriptions  for  the  components  of  the 
penalty  strains  [17,19-23].  In  the  present  context  the  displacement  in¬ 
terpolations  should  yield 

V  ew  "  0(x,n);  V  Y0  "  0(xn)  (■•"■0,1,2,...)  (3.6) 

Kinematic  interpolations,  producing  penalty  strains  of  type  (3.6),  were 
labeled  'anisoparametric'  [23]  to  emphasize  the  necessary  distinction  in 
the  polynomial  variations  of  the  displacement  variables. 

If  the  initial  element  shape,  w_(x),  is  described  by  the  'shallow' 
cubic  (p^2«l): 


where 


n=  x/t,  nc  [0,1];  Pi=  wI»x(ni).  i=o,i, 

then  to  comply  with  (3.6),  the  three  displacement  variables  should  be 
expanded  by  the  distinct-degree  anisoparametric  polynomials: 

p  p+1  p+3 

6  ’  l  a0kn’  “  =  l  \kn'  u  -  l  auk1k  (3'8) 

k=0  k=0  k-0 

where  ag,  a^  and  au  are  the  generalized  coordinates  in  terms  of  the  0,  w 
and  u  nodal  degrees  of  freedom  (dof),  respectively.  Introducing  (3.8) 
into  (2. l)-(2.3)  yields 

p+2  p-1  p 

e  -  l  cknk.  *  -  l  V"  'I  *  l  V-  (3'9) 

k=0  k=0  k=0 

The  strains  (3.9)  contain  two  sets  of  penalty  modes  arising  from  the 
thinness  constraints  (3. A)  and  (3.5): 


p+3  inextensional  modes;  for,  example,  for  p=l: 


e0i  «  aui+  p0awi  ->  0, 

Eil  *  2[a  +  0oa  -  (pL+  20o  )a  ]  -  0, 

u2  w2  Wj 

ea£  -  3[a  +  (p0+  pja  ]  -  A(px+  20o)a  -►  0, 

U3  «!  W2 

e3£  =  4a  +  6(0O+  0,)a  ->  0 

3  U(<  ^ 1  w2 


(3.10) 


p+1  Kirchhoff  modes;  for  p*l: 


Yo* 


a  -  laa 
w.  0, 


0, 


(3.11) 


*  2a  -  £.an  -*•  0. 
w2  0! 

Important  considerations  in  the  element  construction  are  the  number 
of  nodal  dof,  their  locations,  as  well  as  the  order  of  the  strain 
(stress)  approximations.  Thus,  assumptions  (3.8)  produce  an  element 
with  (3p+7)  dof,  and  (p+2),  (p-l),  and  p  polynomial  degrees  for  the 
membrane,  bending  and  shear  strains,  respectively.  It  can  be  seen  from 
(3.10)  and  (3.11)  that  a  reduction  in  the  number  of  dof  is  possible  by 
lowering  the  degree  of  the  strain  interpolations.  This  can  be  accom¬ 
plished  a  priori  to  integrating  the  element  matrices  by  explicitly 
enforcing  the  higher-degree  penalty  modes  [17].  By  this  procedure,  a 
hierarchy  of  elements  with  the  desired  number  of  dof,  nodes,  and  spatial 
variations  of  strains  (stresses)  can  be  constructed  while  retaining  the 
initial  polynomial  variations  of  the  assumed  displacements  and  preserv¬ 
ing  a  rigid-body  motion.  Figures  2  and  3  show  some  nodal  and  strain 
variation  possibilities  for  the  three  lowest  order  elements  (p«l,2,3)  in 
their  initial  and  constrained  configurations,  respectively. 
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REMARK  3.1  It  should  be  realized  that  requirements  (3.6)  are  not  lim¬ 
ited  to  shallowly  curved  elements  alone.  Rather,  they  apply  to  all 
curved  beams.  For  example,  for  a  circular-arch  element  of  radius  R  the 
second  membrane  strain  component  in  (3. A)  is  e  *w/R.  According  to 
(3.6),  the  interpolation  polynomial  for  u  should  ^be  one  degree  higher 
than  that  for  w. 

REMARK  3.2  Note  that  each  penalty  mode  in  (3.10)  and  (3.11)  contains 
contributions  of  at  least  two  displacement  variables  (i.e.,  a  'true' 
penalty  mode  [20]).  This  means  that  proper  interdependence  of  the  vari¬ 
ables  can  be  achieved  in  the  thin  limit,  producing  a  nonlocking  solu¬ 
tion.  Unfortunately,  the  true  penalty-mode  structure  can  be  destroyed 
by  boundary  condition  restraints.  For  instance,  when  an  element  is 
subject  to  an  excessive  number  of  restraints,  an  initially  true  (cou¬ 
pled)  penalty  mode  takes  a  spurious  (uncoupled)  form,  thus  causing 
locking  [23].  This  deficiency  of  the  conventional  approach  is  particu¬ 
larly  pronounced  in  two-dimensional  plate/shell  problems,  where  a  single 
element  along  plate  boundary  is  often  subject  to  a  large  number  of 
kinematic  restraints  [21].  We  shall  further  elaborate  on  this  aspect  in 
Section  5. 

IV.  SIMPLEST  ELEMENT.  According  to  the  anisoparametric  interpola¬ 
tion  strategy  just  described,  the  simplest  first-order  (p*l)  element  is 
a  two-node  (six  dof)  beam  in  which  all  three  strain  components  are 
chord-wise  constant.  The  initial  C°  interpolations  are  quart ic  u, 
quadratic  w,  and  linear  0.  For  convenience,  we  cast  them  in  terms  of 
Lagrange  polynomials: 

A 

u  «  u„  +  ui*i(n), 
i=l 

2 

W-W°  +  ^  wi*i(n),  (A.l) 

i=l 

e  ■  0O  +  ixMn). 


in  which  u^,  w^,  and  0^  are  the  nodal  dof  (refer  to  Figure  2),  and  u^, 
w^,  and  0^  are  the  generalized  coordinates  given  by 

Hi  *  ui~  uo.  Hi  -  wx  -  w„,  0.1  *  0i  -  0O» 

u2  »  u2-(u1+u0 )/2,  w2  “  w2-(w1+w0 )/2, 

Hs  "  u3+(u0-3ux-6u2 )/8,  u4  ■  U(,-(u0+Ui+6u2-Aus)/A  (A. 2) 

and  denote  the  Lagrange  shape  functions: 
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*1  *  n»  *2  -  4nO*n)»  *3  *  (32/3)n(i-n)(2n-i) 


*4  =  (16/3)n(l-r,)(2n-l)(4n-3), 

*  (n  )  -  f  0  if  j8ti 

VV  [  1  if  j-i 


(4.3) 


To  arrive  at  a  constant  strain  element,  we  enforce  explicitly  the 
linear,  quadratic,  and  cubic  inextensional  modes  and  the  linear 
Kirchhoff  mode,  condensing  out  u2,  u3,  u4  and  w2  coordinates.  This 
dof  reduction  procedure  is  analogous  to  th&t  of  [17],  but  it  requires 
fewer  algebraic  manipulations  because  of  the  hierarchical  structure  of 
the  Lagrange  functions.  The  implementation  is  as  follows: 


(1)  Explicitly  enforce  three  higher-order  inextensional  modes: 


f  a  a2 

\  ax  ’  §x2 


which  result  in 


(4.4) 


Hzl 

(Po_  P 1 ) ^ 1 2 

(P0+  Pl)ll2 

Si 

u3 '  = 

-6(p0+pxH13 

(7p0-25p1)t24 

«2 

Jl-J 

0 

-18(p0+p1)t24 

where  coefficients  l  are  computed  from 

pq 


or 


2-  (♦  ) 

an^  ^ 


pq  §q 
an 


( $  ) 
q  q 


ll2m  -  1/8,  t13-  -t24-  -1/128. 

(2)  Explicitly  enforce  the  linear  Kirchhoff  mode: 


few ' 0 


resulting  in 


(4.6) 


w2-  -t01/8  (4.7) 

Introducing  (4.6)  and  (4.7)  into  (4.5),  and  then  substituting  (4.5)  into 
the  initial  displacement  assumptions  (4.1),  gives  the  constrained 
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interpolations  for  u  and  w  in  terms  of  the  end-node  dof,  while  leaving  6 
unaltered: 


1  1 

9  “  \  Ni6i‘  w  "  \  [Niwi  +  Ki9iJ» 
i-0  i-0 

1 

U  "  l  [NiUi  +  LiWi  +  Mi9i3,  (4,8) 

i-0 

in  which 


Nx  -  1  -  N0  -  *!, 

Lp  “  “Lj  -  g[(Po"  Pl)*2  “  g( Po^Pi )®s ] » 

o  7p0“25p1 

M»  M*  "  84[‘(Po+  Pl)*2  +  16  *3  * 

K„  — K,  -  I*,. 

Interpolations  (4.8)  preserve  interelement  displacement  compatibility 
and  satisfy  the  constant  strain  criterion  [27]: 

1  1  1 

£  Nt  -  1.  ^  (Nt  +  Kt)  -  1,  (Nt  +  Li  +  Mi)  -  1,  (4.9) 

i-0  i-0  i-0 

embodying  three  rigid-body  modes.  For  a  straight  element  (po-Pi»0)  the 

membrane  displacement  reduces  from  a  quartic  to  a  linear  chord-wise 
variation,  which  is  commonly  used. 

The  constant  strain  components  are  found  as 
E  —  j^(  u !  -  u0)  +  ( P  i  -  Po)(®l“  8p)/l2, 

*  ■  |(0 1  -  ®o ) »  1  *  j[(w!  -  Wp)  -  ( 0 j  ■».  0o)/2.  (4.10) 

Letting  the  strains  vanish  simultaneously  results  in  the  rigid-body  dis¬ 
placements: 

8-0,  w  -  w0  +  1^0, 
u  -  u  -  tLo0  , 
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(A. 11) 


u  =  u0  =  ulf  9=00=  01. 

Note  that  curvature  <  and  shear  strain  y  in  (A.  10)  are  the  same  as  those 
for  the  two-node  straight  Timoshenko  element  [17].  As  discussed  in 
[17],  these  quantities  are  identical  to  the  corresponding  strains  for  a 
linear,  single-point  quadrature  element  of  Hughes  et.  al.  [18]. 

We  thus  derived  a  simple  displacement  field  satisfying  the  standard 
convergence  criteria  and  the  true  penalty-mode  requirement.  Because  all 
strains  are  chord-wise  constant,  exact  energy  integrals  are  readily  com¬ 
puted  either  analytically  or  using  a  single-point  Gaussian  quadrature. 
The  derivations  of  the  stiffness  and  consistent  mass  matrices  and  con¬ 
sistent  load  vector  are  straightforward:  simply  replace  (A. 8)  in  (2.5) 
and  perform  the  required  variational  operation.  The  element  matrices 
are  given  in  Appendix  A. 

V.  PENALTY  REIAXATIQN.  As  mentioned  previously,  the  penalty 
constraints  are  conventionally  enforced  at  the  element  level.  Conse¬ 
quently,  consistent  kinematic  coupling  in  the  penalty  modes  is  paramount 
in  order  to  avoid  locking.  However,  even  when  such  coupling  is  properly 
maintained,  stiffer  than  desired  deformations  are  expected,  particularly 
in  coarsely  discretized  models.  Furthermore,  circumstances  may  arise 
when  a  single  element  is  subject  to  an  excessive  number  of  displacement 
boundary  restraints.  For  instance,  if  three  out  of  four  bending  dof  are 
fixed  in  our  two-node  element  (e.g.,  wo=w1=0o=O),  shear  locking  occurs, 
since  the  Kirchhoff  constraint  takes  a  spurious  form:  y  =  Bi+0  (refer 
to  (A. 10)). 

A  rational  and  effective  way  of  resolving  these  deficiencies  over 
the  entire  theoretical  range  of  £/ r  is  to  relax  the  strict  enforcement 
of  penalty  modes  at  the  element  level.  Recall  that  the  analytic  theory 
requires  zero-strain  penalty  constraints  at  the  global  structure  level. 
Using  appropriate  correction  (relaxation)  parameters  in  the  form  of 
multipliers  of  the  penalty  stress  resultants  (i.e.,  the  membrane  and 
shear  forces),  it  is  possible  to  relax  the  element  level  constraints, 
yet  retain  their  validity  at  the  global  structure  level.  In  the  limit 
as  the  element  size  diminishes  to  zero,  these  parameters  should  approach 
unity,  since  no  correction  is  needed.  By  a  matching  procedure  involv¬ 
ing  exact  and  finite  element  energy  solutions,  Tessler  and  Hughes 
[22,23]  derived  the  shear  correction  parameters  for  straight  beem  and 
flat  plate  elements.  Herein,  we  adopt  the  same  general  form  [22]  for 
the  correction  parameters,  i.e., 

*  (1  +  C^)  1  (i*-m,s)  (5.1) 
which  for  the  curved  beam  element  equal: 


$ 


2 

m 


1 _ 

1  +  C  (t/r)2 
m 


_ 1 _ 

1+  C  ik2(G/E)  (t/r V 
s 


464 


where  and  C  are  the  element  constants  established  permanently  from 
simple  numerical,  experiments.  Employing  (5. 1),  the  element  constitutive 
relations  are  corrected  accordingly: 

«e-  *1  v-  (5-2) 

Substituting  (5.2)  into  the  strain  energy  in  (2.5)  produces  the  correct¬ 

ed  element  penalty  parameters: 

ai  “  ai*i  ^  “  m,s^»  (5.3) 

or  in  the  expanded  form 

ae  _  (l/r)2  ae  „  ±k2(G/E)  (l/r)2 

m  1  +  C  (lit)7  '  S  1+  C  ik2(G/E)  (lit)7 

m  s 

For  the  two  extrema  of  the  slenderness  ratio,  l/r  -*■  {«,0},  we  have 

if  lit  >  (5.4) 

if  lit  -*■  0  (i-m,s) 

One  major  advantage  with  this  approach,  in  addition  to  achieving 
speedier  monotonic  convergence  (see  numerical  results  in  Section  6),  is 
that  for  any  value  of  l/r  the  stiffness  matrix  is  well-conditioned. 
This  feature  allows  efficient  computations  on  microcomputers  having  a 
short  word  length.  By  contrast,  the  conventional  approach  yields  an 
ill-conditioned  stiffness  matrix  when  l/r  is  large,  in  which  case  high 
precision  computations  are  required  to  avoid  ill-conditioning  errors. 

VI .  NUMERICAL  RESULTS .  The  modeling  capabilities  of  the  two-node, 
constant -strain  element  are  illustrated  through  the  solutions  to  static 
and  free  vibration  arch  and  ring  problems.  The  examples  are  specifical¬ 
ly  selected  to  test  the  shallowly  curved  element  in  the  critical  appli¬ 
cations  to  moderately  deep  and  highly  deep  arches,  ranging  in  the  slen¬ 
derness  from  moderately  thick  to  extremely  thin.  Uniform  meshes  are 
used  throughout,  and  the  results  are  normalized  with  respect  to  the 
appropriate  Bernoulli -Euler  solutions.  Unless  stated  otherwise,  E/G  ■ 
2.6  and  k2  ■  ir2/12. 

Optimal  C  and  C  values.  Our  criterion  for  establishing  suitable  Cs 
and  Cm  values  rests  on  the  requirement  of  a  rapid  monotonic  convergence 
in  the  energy.  In  the  case  of  a  straight  Timoshenko  element  [17,22], 
Cs>l/3  evolves  as  a  natural  choice,  since  it  yields  an  exact  value  of 
the  potential  energy  (even  with  a  single  element!)  for  the  problem  of  a 
cantilever  beam  under  a  tip  load  (refer  to  Figure  4  and  Table  1).  To 
obtain  C  ,  we  solved  a  semicircular  clamped  arch  under  a  central  load, 
using  models  with  C  >1/3  and  C  ■  0,  1/5,  1/4  and  1/3  .  From  the  con¬ 
vergence  of  the  tip8  deflection;  which  is  a  direct  measure  of  the  po¬ 
tential  energy  for  this  problem,  C  >1/4  appears  nearly  optimal. 

m 
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Henceforth,  C  -  1/3  and  Cm  *  1/4  are  adopted  permanently  for  this 
element.  Note  that  regardless  of  the  C  and  C  values,  convergence  is 
attainable  by  mesh  refinement.  However,  in  coarse  and/or  excessively 
restrained  models  [23],  the  optimal  values  yield  notable  solution 
improvements . 

Element  vs.  Beam  Penalty  Constraints.  To  illustrate  the  effect  of  the 
relaxation  parameters  on  the  element  (local)  and  beam  (global)  strains 
in  the  thin  regime,  consider  a  thin  (L/h-10s),  straight  cantilever  beam 
under  a  tip  load  discretized  with  four  elements.  Three  different  solu¬ 
tions  corresponding  to  C  ■  0,  1/5,  1/3  are  obtained.  The  element-level 
shear  strains  are  computed  using  (4.10).  The  beam- level  shear  strain  is 
obtained  from  (2.3),  where  w  and  6  are  exact  polynomial  fits  to  the 
respective  nodal  values,  i.e., 

4  4 

w  «  ^  awk(x/L)k,  0  «  jf  a0k(x/L)k,  (6.1) 

k=0  k-0 

where  a  .  and  aQk  are  the  exact  fit  coefficients  for  w  and  6,  respec¬ 
tively.  *The  corresponding  shear  strains,  the  relaxation  values,  and  the 
shear  force  resultants  are  summarized  in  Table  2.  In  accordance  with 
the  exact  solution  for  this  problem,  the  strains  are  constant  along  the 
span  of  the  beam,  from  both  the  element-level  and  the  beam-level  calcu¬ 
lations.  At  the  element  level,  all  three  C  values  yield  the  correct 
shear  strain  and  shear  force  resultant.  Its  is  often  argued  that  the 
only  'natural'  solution  is  obtained  when  the  Kirchhoff  constraint  is 
enforced  at  the  element  level  (i.e.,  Cg-0  or  <fr2*l).  However,  at  the 
global  level,  only  C  -1/3  produces  the  correct  shear  strain  and  shear 
force  resultant.  W&hout  relaxation  the  global  behavior  is  grossly 
inaccurate.  Importantly,  it  is  the  global  enforcement  of  the  Kirchhoff 
constraint  that  is  the  only  requirement  of  the  analytic  theory;  and  only 
with  relaxation  is  this  global  constraint  possible. 

Quarter-Circular  Arch.  Figure  5  depicts  the  convergence  of  the  dis¬ 
placements  at  load  application  for  a  very  thin,  clamped  arch  (R/h  -  104) 
which  may  be  regarded  as  moderately  deep.  The  convergence  curves  for 
the  maximum  stress  resultants  are  shown  in  Figure  6.  Recall  that  the 
stress  resultants  are  constant  across  the  element  chord.  Hence,  their 
magnitudes  are  attributed  to  the  center  of  the  element.  Rapid  conver¬ 
gence  of  the  displacement  and  stress  variables  is  evident.  Remarkably, 
the  stress  resultants  have  the  same  degree  of  accuracy  as  the  displace¬ 
ments. 

The  effect  of  element  slenderness  is  demonstrated  in  Figure  7, 
where  the  maximum  displacement  and  stress  variables  are  plotted  versus 
the  arch  radius-to-thickness  ratio  (R/h).  The  range  of  the  arch  slen¬ 
derness  parameter  is  taken  from  5  to  10*.  As  expected,  the  element  be¬ 
havior  is  exceptional  throughout  the  wide  slenderness  range,  having  no 
limitations  in  the  thin  regime  (i.e.  no  locking).  For  the  moderately 
thick  arch  (R/h  -  5),  the  deviation  from  the  Bernoulli-Euler  solutions 
is  due  to  transverse  shear,  as  it  should  be.  In  Figure  8  aro  shown  the 
variations  of  the  correction  parameters  for  the  problem.  These  illus¬ 
trate  the  increased  influence  of  these  corrections  for  a  diminishing 
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thickness.  It  is  this  influence  which  is  mainly  responsible  for  the  en¬ 
hancement  in  element  performance. 

Semicircular  Arch.  Figure  9  shows  the  convergence  of  the  maximum  trans¬ 
verse  displacement  for  the  moderately  thick  to  very  thin  semicircular, 
centrally  loaded  clamped  arch.  Highly  accurate  results  are  evident, 
with  the  moderately  thick  case  showing  important  contributions  of  both 
shear  and  membrane  deformations  (and,  naturally,  deviating  from  the  ele¬ 
mentary  solutions). 

Deep  Arches.  The  element  modeling  of  very  deep/thin  arches  is  studied 
on  three  simple  cases  (refer  to  Figure  10):  (a)  a  complete  ring  (R/h  » 
10s)  pinched  by  two  identical  diagonal  forces;  (b)  a  3ir/2-arch  (R/h  ■ 
10s)  clamped  at  one  end  and  restrained  horizontally  at  the  end  where  a 
vertical  force  is  prescribed;  (c)  a  7ir/4-arch  (R/h  *  23.48)  clamped  at 
both  ends  and  loaded  centrally  by  a  vertical  force.  The  latter  case  is 
only  moderately  thin,  but  presented  here  for  the  purpose  of  comparison 
with  solutions  obtained  in  [14]  for  this  problem.  Due  to  symmetry  in 
arches  (a)  and  (c)  only  one-quarter  and  one-half  of  the  arches  are 
discretized,  respectively.  In  all  three  cases  the  results  are  very 
accurate  with  only  a  few  elements.  Notably,  for  arch  (c)  the  present 
element  produces  more  accurate  results  than  any  of  the  thirteen  elements 
investigated  in  [14]. 

Vibration  of  Thin  Ring.  Natural  frequencies  of  vibration  for  a  thin 
ring  (R/h  *  10’)  are  computed  for  the  purpose  of  assessing  the  quality 
of  the  element  consistent  mass  matrix.  Again,  due  to  symmetry,  only 
one-quarter  of  the  ring  is  discretized.  Several  distinct  values  of 
and  C  are  used  to  observe  their  effects  on  the  frequency  solutions. 
The  convergence  curves  for  the  fundamental  frequency,  shown  in 
Figure  11,  illustrate  the  increased  accuracy  of  curved -element  modeling 
over  straight  beam  approximations.  Importantly,  the  beneficial  effects 
of  the  membrane  and  shear  relaxations  are  also  evident,  giving  rise  to 
very  accurate  solutions  even  in  coarse  discretizations. 

In  Table  3  are  summarized  the  eight  lowest  symmetric  frequencies 
obtained  from  an  8-element  quarter-ring  model.  The  results  are  compared 
with  the  Bernoulli -Euler  frequencies.  As  expected  from  a  conforming 
displacement  model,  the  finite  element  frequencies  converge  from  above; 
the  largest  error  is  only  1%  for  the  highest  mode. 

Vibration  of  Semicircular  Arch.  In  this  final  example,  natural  fre¬ 
quencies  for  a  semicircular  hinged  arch  are  computed  for  the  moderately 
thick  (irR/r-lO)  and  moderately  thin  (*R/r«50)  configurations.  The  solu¬ 
tions,  based  on  a  full  60-element  model,  are  compared  with  those  of  Dong 
and  Wolf  [28],  who  used  30  straight  three-noded  Timoshenko  elements, 
having  the  same  number  of  dof  as  our  model.  The  results  in  Table  4  show 
close  agreement  between  the  two  solutions,  however,  the  present  frequen¬ 
cies  are  consistently  lower,  hence  more  accurate.  (It  is  worth  mention¬ 
ing  that  the  Dong-Wolf  isoparametric  quadratic  element,  while  generally 
accurate  in  the  slenderness  range  of  this  example,  exhibits  locking  in  a 
truly  slender  regime.) 
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VII .  CONCLUSIONS .  We  have  presented  a  displacement  formulation 
yielding  simple,  reliable,  and  efficient  shallowly  curved  Timoshenko/- 
Marguerre  elements.  Particular  attention  was  focused  upon  the  simplest 
element  of  the  hierarchical  family  --a  two-node,  six  degrees-of -freedom 
beam  with  constant  components  of  strain.  The  element's  main  features 
are  summarized  as  follows: 


(1)  kinematic  conformability 

(2)  correct  stiffness  matrix  rank 

(3)  free  of  straining  rigid-body  motion 

(4)  consistent  load  vector  and  mass  matrix 

(5)  well-conditioned  stiffness  matrix 

(6)  nodal/dof  simplicity  and  computational  efficiency 

(7)  no  limitation  with  regard  to  thinness,  i.e.,  no  locking  of  any 

type 

(8)  fast  monotonic  convergence  of  the  kinematic  and  kinetic 
variables 

(9)  ideal  microcomputer  suitability. 


The  methodology  could  potentially  be  used  to  construct  simple  and 
efficient  shear-deformable  curved  shell  models  that  would  be  compatible 
with  the  curved  beams  discussed  in  this  paper. 


APPENDIX  A.  The  components  of  the  element  matrices  given  below 
correspond  to  the  nodal  displacement  vector  {u0,u1,w0,w1,e0,61}.  Exact 
explicit  integration  has  been  used  throughout. 

Stiffness  Matrix 


kn  ”  k22*-k12“  k26«k15»-kls“-k25*  8, 

k3 jmk44»-k34“  D^/9.,  k,s-k,6— k4S— k46“  D^/2, 

k55-k66-  D®lp2  +  Db/t  +  D®t/4,  k56— k55+D®t/2  (A.l) 

where 

p  -  (p  -8  ) / 12 , 
i  o 

D®  -  $.2D.  (i«m,s),  D  -  AE,  D  -  k2GA,  D.  -  El, 

ill  m  s  b 

^2  m  _ 1 _ f  ^2  m  _ 1 _ t 

m  1  +  C  (t/r)2  s  1+  C  ik2(G/E)  U/r)2 

m  s 

C  -  1/4,  C  -  1/3. 
m  s 

Consistent  Mass  Matrix 

mi i*m2 2"2m12"  m/3,  mls»-m14-  (3p0-2p1)m/60, 

•ni5"-n>i6"-(2Po+Pi)mt/4,  m2 j"-m24«  (2po-3pi)m/60, 
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*25— *26*  -(P0+2Pi)ml/4, 

m3i*®44“t (2po  "PoPi+2P2  ) / 70  +  l]m/3, 

*S4-  [ (~2pg  +P0Pi-2pi  ) / 35  +  l]m/6, 

*35-036-  [190!  -Po  )/420+l]ml/24, 

*45— *46*  "*35  +  mt/12, 

*s5-*66-  [ (5p§  +2p0p1+13p*  ) / 252  +  40(r/O2  +  l]mt2/120, 
m56*  -raS5  +  rat2/2,  m  *  pAi.  (A. 2) 

Consistent  Load  Vector  due  to  Uniform  Normal  Pressure  q 

fx-  f2-  0,  f3  -  f4»  qt/2,  f5-  -f6-  qi2/12.  (A.3) 
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Figure  1.  Shallowly  curved  beam  element. 
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Element 

Order 

P 

Nodal  Pattern 

mO(x  p*3)  ,w»0(xp +1 ),  0. 0(jft 

Number 

of 

dof 

Strain  Variations 

C  K  Y 

P+2  p-1  p 
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cubic 
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finear 
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•-0-D-* - A - « 

13 

quartic 

Inear 

quadratic 
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16 
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KEY:  •  u,w,0  A  u,w  Du 


Figure  2.  Initial  (unconstrained)  anisoparametric  elements. 
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#— #  •  • 

12 

quadratic 

KEY:  •  u,w,0 

Figure  3.  Explicitly  constrained  anisoparametric  elements  of  uniform 
nodal  patterns  and  uniform  strain  variations. 


472 


Normalized  Tip  Displacements 


Figure  4.  Straight  cantilever  beam  and  semicircular  arch. 


Number  of  Elements 

Figure  5.  Thin  quarter-circular  cantilever  arch  under  tip  load 
convergence  of  tip  displacements. 
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Number  of  Elements 


Figure  6.  Thin  quarter-circular  cantilever  arch  under  tip  load; 
convergence  of  maximum  stress  resultants  (N  and  M  computed  at  center 
of  element  closest  to  clamped  end;  Q  computed  at  center  of  element 
closest  to  load). 


V) 


Figure  7.  Quarter-circular  cantilever  arch  under  tip  load;  maxis.um 
displacements  and  stress  resultants  versus  arch  radius-to-th i ckness 
ratio. 
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Figure  8.  Quarter-circular  cantilever  arch  under  tip  load;  membrane 
and  shear  correction  (relaxation)  parameters  versus  arch 
radius-to-thickness  ratio. 


Figure  9.  Moderately  thick  and  very  thin  semicircular  centrally  loaded 
clamped  arches;  convergence  of  maximum  transverse  displacement. 
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J  "1 


Number  of  Elements 


Figure  10.  Deep/thin  arches;  convergence  of  maximum  transverse 
displacement. 


Figure  11.  Thin  ring;  convergence  of  fundamental  frequency. 
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Table  1.  Determination  of  C  and  C  constants  for  two-node  element 
via  optimization  of  potential  energy  (0exact  “  1.000). 


No. 

Straight  cantilever 

Semicircular  arch 

of 

(L/h-10\ 

C  -0) 

(R/h-10fc 

,  Cg«l/3) 

•1. 

C  *0 

1 

1 

1 

3 

C  -0 

1 

5 

1 

1 

s 

5 

4 

m 

4 

3 

2 

mm 

0.984 

1.000 

0.903 

0.961 

0.975 

0.999 

4 

0.989 

0.993 

0.995 

8 

0.999 

0.998 

0.998 

0.999 

0.999 

16 

m 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

Table  2.  Thin,  straight  cantilever  beam  under  unit  tip  force 

discretized  with  four  elements.  Element  and  beam  shear 
strains  for  various  (L“100,  h«0.1,  G«15.385*10‘). 


■ 

Element 

Beam 

E 

m 

TO 

Q-k’GA-,? 

0 

1.000 

-1.562-10*2 

-1.976*104 

1 

5 

8.432-10'5 

7.903-10'7 

1.000 

-6.249-10"3 

-7.905*103 

1 

3 

5.058-10*5 

7.903*10*7 

1.000 

-7 

-7 

Analytic 

7 .903« 10 

1.000 

7. 903*10 

1.000 

Table  3.  Natur.l  frequencies  for  thin  circular  ring 

w  ■  “  (EI/pAR4)*. 
n  "Ti 


Mod*  Bernoulli-Euler 


number 

n 

theory 

U) 

“n 

2 

2.6833 

4 

14.552 

6 

34.524 

8 

62.514 

10 

98.509 

12 

142.51 

14 

194.50 

16 

254.50 

Present  TError 

element 

100* 


fe 

UJ 

— n 

/  fe 

1  UJ  “UJ 

-n  -n 

2.6844 

0.04 

14.559 

0.05 

34.546 

0.07 

62.588 

0.12 

98.737 

0.23 

143.12 

0.43 

195.97 

0.76 

257.63 

1.23 
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I 

Table  4.  Natural  frequencies  for  moderately  thick  and  thin 

semicircular  hinged  arches  «  u  (EI/pn^A R'')K 

n  U 


Slenderness 

Mode 

Dong  4  Wolf28 

Present 

Z  Difference 

ratio 

number 

element 

100* 

itR/r 

n 

DW 

4jJ 

-n 

U 

— n 

,  DW  v,  DW 

tt±>  '  w  )/w 

“t 

1 

12. C7 

12.79 

0.62 

2 

24.69 

24.64 

0.20 

3 

38.13 

37.98 

0.39 

4 

40.01 

39.90 

0.27 

10  5 

57.62 

57.13 

0.85 

6 

60.42 

59.87 

0.91 

7 

60.83 

60.47 

0.59 

8 

73.40 

72.80 

0.82 

9 

81.03 

80.74 

0.36 

10 

85.18 

84.74 

0.52 

1 

21.60 

21.53 

0.32 

2 

62.57 

62.35 

0.35 

3 

120.8 

120.3 

0.41 

4 

146.3 

146.4 

-0.07 

5 

199.6 

199.2 

0.20 

50 

6 

212.0 

212.1 

-0.05 

7 

278.2 

277.7 

0.18 

8 

335.4 

334.8 

0.18 

9 

368.8 

368.9 

-0.03 

10 

439.5 

438.2 

0.03 
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ABSTRACT.  Admissible  elastic  energy  density  functions  for  highly  deformed, 
compressible,  elastomer  solids  are  derived  from  the  geometric  -  arithmetic 
mean  value  inequality  theorem.  A  finite  element  model  based  on  these 
energies  is  proposed  for  triangular  elements  with  quadratic  displacement 
approximations.  The  computation  of  very  large  deformations  of  an  elastomer 
cylinder  is  performed. 


INTRODUCTION .  The  assumption  of  incompressibility  for  elastomers  is 
usually  made  for  analytical  convenience.  For  numerical  finite  element 
computations  [1]  incompressibility  is  a  disaster.  Having  pressure 
independent  of  the  displacements  and  its  inclusion  in  the  elastic 
variational  formulation  via  Lagrange  multipliers  results  in  loss  of  the  all 
important  energy  positive  definiteness.  Without  a  minimal  variational 
principal  finite  element  convergence  and  stability  becomes  precarious. 

In  the  numerical  modeling  of  nearly  incompressible  elastic  solids 
pressure  is  computed  from  dilatation  with  the  bulk  modulus.  Since  the  bulk 
modulus  is  large  there  are  serious  computational  difficulties  associated 
with  this  approach.  Decline  in  the  condition  of  the  system  of  stiffness 
equations  is  clearly  identified  [2]  in  its  dependence  on  the  formulation 
and  discretization  parameters.  The  adverse  effect  of  a  large  bulk  modulus 
on  the  approximation  accuracy  of  the  finite  element  model  is  also 
understood.  The  modeling  of  near  incompressibility  must  be  balanced  [3] 
with  the  computational  considerations  of  numerical  stability  and  accuracy. 

This  paper  has  three  points  to  make.  The  first  is  to  show  how  the 
arithmetic  -  geometric  -  mean  inequality  theorem  rationally  leads  to  most 
of  the  elastic  energy  density  functions  in  use  for  the  large  displacement 
analysis  of  nearly  incompressible  solids  (typically  elastomers).  The 
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second  is  to  estimate,  using  a  uniaxial  stretching  problem,  the  upper  limit 
for  the  bulk  modulus  needed  to  reproduce  the  observed  residual 
compressiblity  of  rubberlike  (elastomer)  solids.  The  third  point  is  to 
develop  a  quadratic  six  node  finite  element  for  the  analysis  of  elastomers 
which  uses  the  positive  definite  energy  density  functions  derived  in  the 
first  part. 


POSITIVE  DEFINITE  ENERGY  DENSITY  FUNCTIONS.  We  turn  now  to  the  question  of 
the  specific  form  for  the  energy  density  function  w.  We  will  show  that 
modifications  to  the  most  widely  used  energy  density  functions  can  be  made 
using  the  following  theorem  [4]  so  that  they  represent  positive  definite 
functionals  for  any  combination  of  stretch  ratios. 


Arithmetric  -  Geometric  -  Mean  Inequality  Theorem: 

Let  a,b,c  be  positive  real  numbers.  Then 

\  (a+b+c)  2  (abc)1/3  (1) 

where  equality  holds  only  when  a*b«c.  END  THM. 


Hence  the  function 

F(a,b,c)  *  a+b+c-3(abc)^3  (2) 

is  positive  semidef inite. 

To  apply  this  theorem  to  our  purpose  we  consider  three  dimensional  finite 
elasticity  and  write  A^,  A£,  A^  for  the  principal  stretch  ratios.  Then, 

^  r  r  it 

for  any  real  r  and  a*A^,  b-A^,  c*A^  we  have  from  equation  (2)  that 

W W  *  A1  +  X2  +  A3  "  3(V2A3)r/3  (3) 

is  positive  semi  definite  and  can  be  taken  as  a  summand  in  an  energy 
density  function  for  an  isotropic  compressible  solid.  When  perfectly  in¬ 
compressible,  the  solid  is  with  ^j^A^  “  1  and 


F  *=  A[  +  a£  +  A'  -  3  (4) 

but  the  stretch  ratios  in  this  expression  are  not  independent  and  Fr  is  not 

positive  definite  unless  one  of  the  stretch  ratios  is  expressed  in  terms  of 
the  other  two  through  *or  instance,  if  A^  is  eliminated  from 

F^in  equation  (4)  by  A^  *  l^A^),  then 
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(5) 


Fr  "  X1  +  A2  +  1/(A1A2>  "  3 

is  such  that  Ff>  0  if  A^  >  0  and  A2  >  0;  and  Fr  *  0  only  when  Aj»A2=  * * 

In  the  compressible  case  when  r  ■  1 

Fj  -  Aj+  A2+  A3  -3(A1A2A3)1/3  (6) 

corresponds  to  the  incompressible  expression  for  the  elastic  energy  density 
function  of  Varaga  [5].  A  more  general  energy  density  functional,  w,  is 
conveniently  written  as  a  polynomial  of  F^ 

w  *  c.F.+c,,F2+* • *+c  Fn  ,  c,>0  (7) 

1  1  2  1  n  i 

Choosing  r*2  in  equation  (3)  results  in 

F  =  Aj+A2+A3-3(A1A2A;j)2/3  (8) 


which  is  the  generalization  of  the  NeoHookean  expression  for  a  compressible 
solid,  while  r*-2  produces 

F_2  *  A^2+A22+A32-3(A1A2A3)'2/3  (9) 


so  that 

w  =  c1F2+c?F_2  (10) 

is  the  compressible  counterpart  to  the  Mooney-Rivlin  energy  density 
function. 

The  strain  invariants 

11  =  Ai+A2+A3 

12  -  A2A2+A2A2+A2A2  (11) 

2  2  2 
!3  =  A2A2A2 

are  commonly  used  in  the  literature  on  rubber  elasticity.  We  can  use  the 
strain  invariants  as  follows  to  construct  energy  density  functionals. 

2  2  2 

Setting  a«Aj,  b«A2,  c*A2  we  obtain  the  inequality 
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with  equality  holding  if  and  only  if  A.  ■  A9  ■  A..  In  the  same  manner. 


22  22  22 

setting  a«A^A2,  b^AjA^,  c»A^A^  produces  the  second  inequality 

I2-3I2/3  *  0  (13) 

with  equality  holding  if  and  only  if  A^  ■  A2  ■  A^. 

The  modified  Mooney-Rivlin  energy  density  function 

w  -  c1(I1-3l3/3)+c2(I2-3I^/3),  Cj,  c2  >  0  (14) 

is  then  only  positive  semidef inite;  w«0  when  A^  *  A2  *  A^  even  when 

A^  *■  1.  To  have  a  positive  definite  energy  we  have  to  add  to  it  a 

dilatoric  contribution  representing  the  energy  stored  in  compression.  We 
use 

w  *  |X[ln(A1A2A3)]2  (15) 

where  X  is  the  bulk  modulus. 

Ogden's  energy  density  function  [6-8]  is  essentially  F  with  r  being 
rational  and  fitted  to  experimental  data. 

The  Valanis-Landel  energy  density  function  [9] 

3  1  2 

w  »  2y  ^  A^lnA^-1)  +  lnd^A^) y  (16) 

is  not  derivable  from  the  arithmetric-geometric-mean  inequality  theorem  but 
is  based  on  the  inequality 

AlnA  i  A-l,  A  >  0  (17) 

where  equality  holds  only  when  A  *  1.  Figure  1  shows  the  variation  of 

<fr(A)  =  AlnA  -  A  +  1  (18) 

with  A,  and  indeed  <J( A )  >  0  when  A  >  0  and  A  *  1.  Hence  the  deviatoric 
part,  factored  by  2y,  is  positive  definite  and  one  may  set  X  ■  0  in 
equation  (16)  for  the  Valanis-Landel  energy  density  function. 
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NEAR  INCOMPRESSIBILITY.  Suppose  that  we  choose  to  represent  the  elastic 
behavior  of  our  compressible  rubber  by  the  modified  Mooney-Rivlin  energy 
density  function 


w  =  c1(I1-3I3/3)+c2(I2-3I3/3)+  |X[ln< A1X2X3) ]' 


(19) 


or  by  that  of  Valanis-Landel  in  equation  (16).  How  should  we  select  the 
bulk  modulus  X  to  accommodate  the  observed  compressibility  of  rubber?  It 
is  important  in  the  finite  element  modeling  of  rubber  to  keep  the  bulk 
modulus  as  low  as  accuracy  demands  will  allow.  To  answer  the  question  we 
shall  numerically  solve  the  uniaxial  tension  problem  with  the  different 
energy  density  functions  and  estimate  the  bulk  modulus  needed  to  reproduce 
the  dilatational  experiments  of  Penn  [10]. 

Consider  the  simple  tension  bar  of  Figure  2  having  a  cross  section 
<<  1  and  unit  length.  Stretched  by  an  a^i^l  force  P  the  bar 


area  e 

extends  to  length  A 3  with  cross  sectional  area  e 
of  the  stretched  bar  is  given  by 


V 


The  potential  energy 


tt  *  e w(A^ , A^J-Pd^-l ) 


(20) 


and  the  two  equations  of  equilibrium 


<hr 

3A 


1 


and  =  0 

3A3 


(21) 


become 


3w 


3A 


1 


3w 


-  0  and  -  o  =  0 


3A, 


(22) 


where  o  =  P/e  is  the  axial  stress. 

The  first  of  equations  (22)  becomes  for  the  Mooney-Rivlin  material 


0  -  (A*-A2l)+(—  )(A^+I3-2A^I2)+^(^)ln(l) 
11  c^  1  1  4  1  Cj 


(23) 


where  I  =  (A2A3)2/3 


For  any  given  A^  >  0  equation  (23)  is  solved  for  I,  and  hence  for  A3by  the 
Newton-Raphson  method. 

For  the  Valanis-Landel  function  the  first  of  equations  (22)  produces 


ln(A2A3) 


-  -^A.lnA 

X  1 


(24) 
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that  yields  for  given  values  of  A^. 

Figure  3  traces  the  relative  volume  change  A-A2A_-1  vs  A-  for  the 
Valanis-Landel  function  with  X/ 2\i  =  1000,  and  the1Mooney-Rivlin  model  with 
c2/cl  ■  0.1  and  X/c:  =  1350,  compared  with  the  experiments  of  Penn.  These 

values  of  X  are  reasonable  upper  limits  for  numerical  modeling  of 
nearly  incompressible  rubber. 


QUADRATIC  ELEMENT.  A  triangular  three  node  bilinear  axisymmetric  element 
for  total  Lagrangian  analysis  of  elastomers  is  given  in  reference  [11]. 

The  quadratic  six  node  element  is  shown  in  Figure  A.  We  use  quadratic 
interpolation  of  r  and  z  (deformed  coordinates)  over  triangles  in  the  (a,p) 
plane  (undeformed  coordinates).  The  discrete  variables  associated  with  the 
element  are 


yr  , 

e  =  (rj,  Zj,  r2,  z2,  • 


(25) 


where  r  and  z  are  the  values  of  r  and  z  at  node  number  1,  etc.  The 
element  nodal  point  numbering  is  shown  in  Figure  A. 

The  interpolations  are  described  in  terms  of  the  local  coordinate 
system  (£,n)  shown  in  Figure  5.  The  mapping  from  the  (a,p)  plane  onto  the 
(£,n)  plane  is  done  with 


a  =  a1(l-^-n)+a2C+a3n 

3  -  31(i-c-n)+P2W3n 


(26) 


and  r  and  z  are  interpolated  inside  the  element  by 


-+T  -+T-+  -vr  VT-+ 

r  =  e  $  ,  z  =  eip 


where  the  shape  function  vectors  are 


4?  -  ($i,0,(|)2,0,<j>3,0,<^,0,$,.,0,<frg,0) 

and 

with  the  six  shape  functions 


♦l 

1  -3  -3  2  A  2 
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1 
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1 
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(27) 


(28) 

(29) 


(30) 
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As  for  the  partial  derivatives  of  0  with  respect  to  £  and  n  we  have 


and 


♦u 

-3  4  4 

*2£ 

-1  4 

*3£ 

*H 

4  -8  -4 

*5£ 
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*H 

-4 
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-1  4 
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-4 

*5n 
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1 

£ 

n 


(31) 


(32) 


By  the  chain  rule  of  partial  differentiation  we  have  from  equation  (26) 
that 


Vai 

Vai 


P2-Pl 

P3-Pl 


'  3 

3a 

3 

3P 

(33) 


and  the  Jacobian  of  this  mapping  is 

6  =  (a2-a1)(P3~P1)  -  (a;j-a1)(P2-P1) 

Inversion  of  equation  (33)  yields 


3 

3a 

1 

h'h  -<S2-@l) 

3 

H 

6 

3 

3p 

"<a3"al)  V°1 

3 

.  3n  . 

(34) 


(35) 


Differentiating  r  and  z  in  equation  (27)  with  respect  to  £  and  n  leads  to 
r'.  t  n  n 

(36) 


-*T-t 

e  . 

-►T-t 
r  ■  e  $ 

n  > 

e  \ 

■♦Tt 

z  ■  e 
n  i 
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and  with  equation  (35)  we  get  that 


+T+  -*■ 

*a  ■  e  p  ,  p 
-+T-*  -* 

za  ■  e  q  •  q 


->T-»  •* 

rp  "  e  r  ’  r 

-+T-+  •> 

z  =  e  s  ,  s 
P 


]((s3-e1)J5-(p2-s1)Jn) 

|<(e3-P1)J{-02-e1)ln) 


|(-(a3-a1)^+(a2~a1)|  ) 
|(-(a3-a1)^+(a2-a1)ln) 


(37) 


where  6  is  the  Jacobian  of  the  mapping  from  (a,p)  onto  (£,n)  as  given  by 
equation  (34). 

The  principal  stretch  ratios  are  given  by  (see  reference  [11]) 


where 


and 


A2  -  |(A+B+((A-B)2+4C2)1/2) 

A2  =  |(A+B-((A-B)2+4C2)1/2) 

-  2 

A2  =  — 

A3  2 
a 


*  2  2 
A  -  r  +  z 
a  a 


C  *  r  r.  +  z  z. 
a  P  a  p 


B  ■  A  + 


(38) 


(39) 


Then 


and 


-+T  r  -+-*T  1  -+T  r  '►“►T  ■■  ■* 

A  *  e  Ipp  Je  +  e  [qq  Je 

-vrr-*-*T,+  ,  -»Tr-^T,-> 
B  *  e  [rr  ]e  +  e  [ss  ]e 

-^Tf-^T  ,  -*-»T,-> 

C  *  e  [pr  +  rp  ]e 


(40) 


The  remaining  derivation  of  the  element  gradient  and  tangent  matrices 
is  identical  to  the  derivation  presented  in  reference  [11]  for  the  bilinear 
element  except  for  the  numerical  integration.  The  elastic  energy  per  one 
radian  of  a  typical  element  is 

E  -  |  w(A1,A2,A3)adadp  (41) 

where  a  is  given  in  equation  (26).  Then,  in  the  (£,n)  plane 
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r1  r1^ 

E  ■  6  I  wadpdS 

J  5=0  ■*  n=o 


(42) 


where  6  is  given  by  equation  (34).  The  integration  of  (42)  is  done 
numerically  by  sampling  w  and  a  at  the  three  integration  points 


with  equal  weights  of  1/6. 


COMPRESSION  OF  A  CYLINDER  USING  QUADRATIC  ELEMENTS.  An  analysis  of  a 
rubber  cylinder  in  compression  was  given  in  [11]  for  bilinear  elements. 

The  Valanis-Landel  energy  density  function  was  used.  To  demonstrate  the 
modified  Mooney-Rivlin  energy  density  function  and  the  quadratic  element  we 
again  analyze  the  end  thrust  (compression)  of  a  cylinder.  To  compute  the 
element  gradient  and  tangent  matrices  we  need  the  derivatives  of  the 
energy  density  function  with  respect  to  the  stretch  ratios.  Let  I  *  1^, 

J  *  and  K  *  I^  in  equation  (19)  and 


<  >i 

,  iU 

axi 

(43) 

Then, 

Ii“ 

2At  ,  Iu 

*  2  , 

■  'ij'0 

Ji  ” 

2A1(I-aJ)  , 

Jii  ' 

2(I-A2)  ,  J  -  4AiAj 

(44) 

and 

Ki“ 

Tf  •  Kii 

s  — k 

A2 

’  Kij  AiAjK 

Computing  the  derivatives  we  have 


W1  "  2ci< V  y-K1/3)+2c2(Ai(I-A^)-  -|-K2/3)+  pln(K)  (45) 


'ij 


4c 

~3 


1  rr^'S^iV  5  xXk2/3)+4XaX  *’j  <46) 


i  j 


*i  "j 


ij 


and 


if  1/3  2  9  K2^  2X 

w..  =  2c. (1+  — — r-)+2c  ((I-A7)-  5  jS-)+-^(2“ln(K» 

3Aj  xf  xf 


(47) 
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Figure  6  shows  the  original  and  deformed  mesh  for  a  cylinder 
restrained  from  slipping  at  the  top  and  compressed  to  75%  of  its  height. 
The  solution  was  obtained  using  the  Newton -Raphson  method  as  described  in 
reference  [11] • 


REFERENCES. 

[1]  I.  Fried,  '’Reflections  on  computational  approximation  of  elastic 
incompressibility".  Computers  and  Structures,  17,  1983,  161-168. 

[2]  I.  Fried,  "Influence  of  Poisson's  ratio  on  the  condition  of  the 
stiffness  matrix",  Int.  J.  Solids  and  Structures,  9,  1973,  323-329. 

[3]  I.  Fried,  "Finite  element  analysis  of  incompressible  material  by 
residual  energy  balancing",  Int.  J.  Solids  and  Structures,  10,15.4, 
993-1002. 

[4]  E.  Beckenback  and  R.  Bellman,  An  Introduction  to  Inequalities,  Random 
House,  NY,  1961. 


[5]  0.  H.  Varga,  Stress-Strain  Behavior  of  Elastic  Materials,  J.  Wiley, 
NY,  1966. 

[6]  R.  W.  Ogden  and  P.  Chadwick,  "On  the  deformation  of  solid  and  tubular 
cylinder  of  incompressible  isotropic  material",  J.  Mech.  Phys. 

Solids,  20,  1972,  77-90. 

[7]  R.  W.  Ogden,  "Volume  changes  associated  with  the  deformation  of 
rubberlike  solids",  J.  Mech.  Phys.  Solids,  24,  1976,  323-338. 


[8]  R.  W.  Ogden,  "Nearly  isochoric  elastic  deformations:  applications  to 
rubberlike  solids",  J.  Mech.  Phys.  Solids,  26,  1978,  37-57. 

[9]  K.  C.  Valanis  and  R.  F.  Landel,  "The  strain-energy  function  of  a 
hyperelastic  material  in  terms  of  the  extension  ratios",  J.  Appl. 
Physics,  38,  1967,  2997-3002. 


[10]  R.  W.  Penn,  "Volume  changes  accompanying  the  extension  of  rubber", 
Trans.  Soc.  Rheology,  14,  1970,  509-517. 

[11]  A.  R.  Johnson,  C.  J,  Quigley  and  I.  Fried,  "Large  deformations  of 
elastomer  cylinders  subjected  to  end  thrust  and  probe  penetration", 
Trans.  Third  Army  Conf.  Applied  Mathematics  and  Computing,  Georgia 
Inst.  Tech.,  Atlanta,  GA,  1985. 


488 


Figure  1. 


Figure  2 


Variation  of  *(A)  =  Aln(A)-A+l  with  A. 


.  Simple  tension  bar. 
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Figure  3.  Relative  volume  change  (A^A^-l)  vs  *3  for  t^ie  simple 
tension  bar. 


Figure  A.  Quadratic  six-node  triangular  element  in  the  (a,0)  plane. 


Figure  5.  Local  coordinate  system  for  interpolations. 


Figure  6.  Original  and  deformed  mesh  for  a  compressed  cylinder. 
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ABSTRACT 


Spectral  methods,  noted  for  their  accuracy  and  efficiency,  are  used  in 
transonic  areodynamics  problems  to  develop  new  and  efficient  tools  particularly 
different  from  conventional  finite  difference  and  finite  element  techniques. 

Two  transonic  flow  problems  are  investigated.  First,  a  pseudospectral  method 
utilizing  a  direct  inversion  method  has  been  developed  to  solve  the  steady 
small  disturbance  flow  equations  for  subcritical  flows.  Second,  a  numerical/ 
analytical  method  involving  a  Chebyshev-Fourier  pseudospoctral  approximation 
has  been  developed  and  applied  to  solve  the  two-dimensional  shockless  transonic 
potential  flow  in  the  hodograph  plane.  Results  have  been  obtained  for 
parabolic  arc,  NACA  0012  and  NLR  quasi-elliptical  airfoils. 
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I  Introduction 


Spectral  methods  have  emerged  recently  as  efficient  numerical  tools  for 
solving  partial  differential  equations[16] .  In  an  effort  to  develop  new 
numerical  techniques  for  transonic  flow,  spectral  methods  are  considered  for 
their  accuracy  and  efficiency.  They  are  an  extension  of  the  classical 
seperation  of  variables  method,  capable  of  approximating  smooth  solutions  with 
exponential  convergence.  There  are  three  main  cypes  of  spectral  methods, 
Galerkin,  Tau  and  collocation  or  pseudospectral ;  the  latter  is  used  ir  the 
following  problems  for  its  suitability  to  nonlinear  and  variable  coefficient 
equations.  The  spectral  solution  is  expressed  analytically  and  hence  the  only 
error  involved  is  due  to  the  truncation  of  the  infinite  series.  This  is  only 
true  of  smooth  functions.  Thus,  the  present  work  is  limited  to  shockless 
flows  in  which  no  large  discontinuties  occur.  Note  however  that  shock 
capturing  or  shock  fitting  can  be  used  in  conjunction  with  spectral  methods  to 
avoid  this  1 imitation[ll].  Furthermore,  a  more  general  class  of  solutions  may 
be  obtained  with  relatively  little  extra  effort  and  the  solution  at  any  point 
in  the  domain  is  obtainable  with  the  same  degree  of  accuracy.  Finally  grid 
generation  is  relatively  trivial,  eliminating  a  large  computational  portion  of 
most  finite  difference  and  finite  element  methods. 

Thus,  spectral  methods  offer  definite  advantages  over  conventional  finite 
difference  and  finite  element  methods.  Applications  of  the  methods  to  real 
problems  will  determine  their  true  value.  Transonic  flow  is  complicated  by 
the  appearance  of  embedded  supersonic  (or  subsonic)  regions,  the  existence  of 
shock  waves  and  the  non-1 inearity  of  the  governing  equations  which  are  of 
mixed  (hyperbolic-elliptic)  type.  In  the  following,  the  shock  wave  problem 
has  already  been  eliminated  by  coisidering  shockless  flows.  The  nonlinearity 
of  the  equations  is  easily  handled  by  the  pseudospectral  method.  No  special 
considerations  are  needed  for  the  embedded  regions  corresponding  to  the  change 
in  type  of  the  equations  when  spectral  methods  are  used.  Transonic  flow 
problems  of  two  types  are  investigated.  The  first  is  a  calculation  of 
transonic  flow  about  airfoils  using  the  transonic  small  disturbance  equation. 
This  equation  is  derived  from  the  full  potential  equation  by  assuming  that  the 
disturbances  introduced  into  the  flow  field  by  a  body  are  at  least  an  order  of 
magnitude  smaller  than  the  undisturbed  conditions.  This  approximation  is 
valid  for  thin  airfoils  where  the  maximum  thickness  to  chord  ratio  is  at  least 
an  order  of  magnitude  less  than  unity  at  small  angles  of  attack.  The 
transonic  small  disturbance  equation  is  highly  nonlinear  and  is  solved  by  a 
pseudospectral  method.  The  second  is  a  calculation  of  transonic  flow  about 
airfoils  using  the  hodograph  formulation.  This  involves  transforming  the  full 
potential  equation  to  the  hodograph  plane  where  the  independent  variables  are 
the  component  of  velocity.  The  resulting  hodograph  equation  is  a  linear 
variable  coefficient  equation  which  can  be  efficiently  solved  by  a 
pseudospectral  method.  In  both  cases  different  sets  of  boundary  conditions 
and  other  difficulties  must  be  addresses. 

In  the  following,  the  pseudospectral  method  is  described  in  general  terms 
in  a  first  part.  The  transonic  small  disturbance  problem  follows  with 
specific  considerations  for  this  case.  Results  for  parabolic  arc  airfoils  in 
transonic  flow  are  presented.  The  hodograph  problem  is  then  described  and  an 
interative  solution  procedure  is  presented.  Results  include  subsonic  flow 
about  a  NACA  0012  and  transonic  flow  about  a  NLR  quasi-elliptical  airfoil. 
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transonic  small  disturbance  problem  follows  with  specific  considerations  for  this  case.  Results  for 
parabolic  arc  airfoils  in  transonic  flow  are  presented.  The  hodograph  problem  is  then  described 
and  an  iterative  solution  procedure  is  presented.  Results  include  subsonic  flow  about  a  NACA  0012 
and  transonic  flow  about  a  NLR  quasi-elliptical  airfoil. 

2  Pseudospectral  Method 

Spectral  methods  are  based  on  representation  of  the  solution  to  an  equation  as  a  truncated  series 
of  known  smooth  functions  of  the  independent  variables.  For  an  ordinary  or  partial  differential 
equation  of  the  form 

Lu(*,y,...)  =  0  (1) 

where  L  denotes  an  arbitrary  differential  operator,  the  solution  u  is  represented  by 

«=EE...An*.J,(i)yn(y)-  (2) 

n=0  m=0 

where  An(x),ym(y)...are  known  sp  oith  functions,  and  Anm...  are  unknown  coefficients.  Each  group 
of  these  functions  is  usually  a  set  o '  orthogonal  functions  chosen  to  suit  the  particular  problem. 
The  most  popular  choices  of  functions  are  traditionally  Fourier,  Chebyshev  or  Legendre  functions. 
However  any  set  of  orthogonal  functions  can  be  used.  In  the  present  description  of  spectral  methods 
let  us  assume  Chebyshev  polynomials  will  be  used. 

In  addition,  in  a  pseudospectral  method,  the  dependent  variables  are  represented  by  a  set  of 
values  at  the  collocation  points  ...  as 

N  m 

•**/...  =  £  £  (3) 

n=0  nv=0 

Other  terms  in  the  equation  (1)  such  as  the  nonlinear  or  variable  coefficient  terms  are  evaluated  at 
the  collocation  points.  For  Chebyshev  collocation  methods  the  collocation  points  are  given  by 

Xj  =  cos—  for  i  =  0,  N  (4) 

N 

which  gives  uneven  grid  spacing,  points  being  clustered  near  edges  of  the  domain. 

In  what  follows,  the  principle  of  collocation  spectral  methods  is  demonstrated  in  a  one-dimensional 
situation  with  Chebyshev  polynomials.  The  extension  to  two  dimensions  is  straightforward.  For  a 
one-dimensional  problem, 

Lu(x)  =  0,  (5) 

let, 

N 

(6) 

fl=0 
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when  Tn(x)  are  the  Chebyshev  polynomial*  given  with  their  properties  in  [7],  u  is  evaluated  at 
the  collocation  point*  (4)  a*  follows: 


=  Xrf  AnTn(*,). 

n*0 

The  coefficients  An  are  calculated  by  inverting  the  series 

An  =  ~  ^  —UkTn{*k) 
c"  t=o  c* 


(7) 


(8) 


when 


e0,cff  =  2,Cj  =  1  for  all  other  j. 


(9) 


Once  the  coefficients  An  are  found,  derivative  terms  utl  uzs  etc.  of  the  equation  are  easily  calculated 
in  spectral  space  due  to  the  properties  of  the  Chebyshev  (or  Fourier)  function*  which  are  given  in 
[7].  As  an  example  let  u,  be  calculated  in  Chebyshev  spectral  space: 


=  £  BnTn{x) 


n=0 


(10) 


when  the  coefficients  are  given  by 


Bn  —  —  ^2  pAf,.  (11) 

"  p=n+ 1  j-f  rw*<m 

The  coefficients  of  the  suras  can  be  evaluated  with  the  use  of  Fast  Fourier  TVansform  (FFT) 
algorithms  in  O (Ntog(N))  operations  for  a  one-dimensional  problem.  While  the  spatial  derivative 
is  evaluated  spectrally,  the  temporal  derivative  is  evaluated  by  finite  difference  discretization  such 
as  explicit  forward  Euler,  Crank-Nicolson... 

For  two-dimensional  problems  the  approach  is  similar  to  the  above  description.  However  at 
this  point  to  efficiently  invert  the  many  sums  involved  there  must  be  special  considerations  for  the 
two-dimensional  case.  Indeed  the  sum 

N  M 

u  =  5Z  Arvn7n(*)2,m(y )  (12) 

n=0  m=0 

can  be  inverted  using  a  FFT  within  0{N1log(N))  operations.  However  it  is  shown  in  [10]  that  for 
N  <  64  the  use  of  a  direct  inversion  method  is  more  efficient  than  the  FFT. 

In  the  direct  inversion  method  the  collocation  solution  u;*  can  actually  be  written  in  terms  of 
matrices  as  follows: 

M  =  [r.|[A|(T,)r  (13) 

where  [A]  is  the  full  NxM  matrix  of  coefficients  Anm  and  [Ts]  is  the  full  NxN  matrix  of  functions 
in  x,  (ryj  MxM  in  y,  such  that  the  matrix  element  is  given  by 

Tjn  —  2» ,(*,)  (14) 
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for  Chebyshev  collocation,  xy  giveu  by  (4).  The  matrix  [A]  can  be  calculated  by  inversion  of  (13), 

m = (is) 

in  0(JVs)  operations  by  a  pivoting  matrix  algorithm.  This  is  in  fact  more  efficient  than  the  FFT 
since  the  [T]  matrices  do  not  change  through  the  iteration  procedure  and  hence  only  need  to  be 
inverted  once  at  the  beginning  of  the  computation.  Derivatives  may  be  calculated  as  in  the  one- 
dimensional  case  with  new  coefficients  £,-y  or  using  a  matrix  derivative  operator  as  follows.  The  x 
derivative  of  u  may  be  written 

M  =  (0]M  Wher.  [D|  =  |r|[P|[r|-‘.  (16) 

Other  derivatives  are  similarly  expressed, 

[«„]  =  (u  }[D)t,  W  =  [£]»(«][«„,]  =  M(D]",  [«,„]  =  [D)[«][JD]r.  (17) 

3  The  Transonic  Small  Disturbance  Equation 

3.1  Governing  Equations 

3.1.1  Unsteady  Flow 

TYansonie  flows  over  thin  air  foils  can  be  described  by 

AfV«  +  2AfV*  =  (1  -  M2  -  M*(7  +  1)*«)*„  +  4>„  (18) 

for  most  design  considerations  and  flutter  analysis.  In  equation  (18),  M  is  the  free  stream  mach 
number,  7  is  the  ratio  of  specific  heats  and  <j>  is  the  small  disturbance  velocity  potential.  Equation 
(18)  assumes  that  the  flow  is  inviscid,  isentropic  and  the  perturbed  quantities  are  small  compared 
to  those  of  the  mean  flow.  The  pressure  coefficient  is  given  by  linear  theory  as 

Cp  =  -  2(*  +  *.).  (19) 

The  velocity  vector  on  the  boundary  of  the  airfoil  is  required  to  satisfy  the  tangency  condition 
for  the  small  pertubation  assumption, 

*}  =  l  (it  +  ft)  (20) 

where  6  is  the  maximum  thickness  to  chord  ratio  and  /*  is  the  basic  shape  of  the  top  and  bottom 
of  the  airfoil  respectively.  Equation  (20)  is  applied  at  the  y  =  0  line  in  accordance  with  thin  airfoil 
theory. 

Across  the  wake  the  normal  velocity  is  continuous  and  the  jump  in  pressure  is  zero.  These 
conditions  are  imposed  by  letting 

M=0  (21) 

M*  +  Ms*0  (22) 

where  [V>]  denotes  the  jump  in 

At  large  distances  away  from  the  airfoil,  the  disturbances  must  vanish.  This  implies  that  the 
perturbed  velocity  potential  and  its  derivatives  must  approach  zero  away  from  the  airfoil.  When 
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such  boundary  conditions  are  imposed  with  equation  (18)  however,  outgoing  disturbances  are 
reflected  back  into  the  computational  domain.  These  disturbances  reduce  both  the  accuracy  and 
rate  of  convergence  of  any  numerical  scheme.  The  behavior  of  the  perturbations  away  from  the 
airfoil  is  accounted  for  by  the  fact  that  equation  (18)  is  an  inviscid  hyperbolic  equation  in  time. 
The  disturbances  modeled  by  equation  (18)  will  therefore  retain  a  finite  amplitude  even  at  infinite 
distances.  To  prevent  the  outgoing  waves  from  being  reflected  back  into  a  rectangular  computational 
domain  several  researchers [4]  use  the  following  radiating  boundary  conditions 


m  d 
y/1  -  M2  dt 


)t  =  0 


(23) 


for  the  top  and  bottom  boundaries  respectively.  The  nonreflecting  upstream  and  downstream 
conditions  are  satisfied  by 


M  d 
l^Mdy 


)<t>  =  0. 


(24) 


3.1.2  Steady  Flow 

For  steady  flows  equation  (18)  reduces  to 

(1  -  M  -  M2( 7  +  l)4>z)$xx  +  =  0  (25) 

and  the  pressure  coefficient  is  given  by 


C,  =  -2  d«.  (26) 

The  boundary  condition  on  the  airfoil  is  the  tangency  condition  for  a  rigid  shape 

*i = at  m 

and  in  the  wake  [^«]  and  [^„]  are  zero. 

In  the  far  field,  4>  and  it  derivatives  approach  zero.  Conditions  that  were  found  to  be  numerically 
stable  for  a  rectangular  domain  are 

<t>v=0  (28) 

at  the  top  and  bottom  boundaries  and 

<t>,  =  0  (29) 

upstream  and  downstream  of  the  airfoil. 

3.2  Pseudospectral  Approximation 

In  this  section,  a  numerical  method  based  on  pseudospectral  discretization  in  space  will  be  de¬ 
veloped  to  solve  the  steady  small  disturbance  equation  over  a  parabolic  arc  airfoil  at  zero  incidence. 
The  equations  will  be  discretized  on  a  multidomain  grid  system  which  simplifies  the  implementaion 
of  the  tangency  condition  and  enhances  the  efficiency  of  the  numerical  scheme.  An  artificial  time 
dependence  is  imposed  on  the  steady  equation  so  that  a  forward  Euler  scheme  can  be  used  to  iterate 
the  solution  to  a  steady  state. 
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3.2.1  Iteration  Scheme 

The  following  time  dependence  it  introduced  into  equation  (25) 

4>r  =  (1  -  A/*  -  +  1)*,)*..  +  4>„  (30) 

where  r  is  not  a  physical  time  but  a  pseudotime  for  iteration  purposes.  When  steady  state  is 
achieved,  the  r  dependence  will  vanish  and  equation  (30)  will  be  identical  to  equation  (25). 

The  time  derivative  in  equation  (30)  is  discretized  by  using  the  explicit  forward  Euler  represen¬ 
tation.  The  resulting  temporal  discretization  is 

*"+l  ~r  =  Ar[(l  -  M*  -  M\  7  +  1  )«)«.  +  *&)  (31) 

3.2.2  Evaluating  the  Derivatives 

The  derivatives  in  equation  (31)  are  evaluated  using  the  peeudospectral  techniques  developed 
in  the  previous  section.  The  potential  at  time  level  n  is  written  as 

*(*,.  Vy)"  =  E  E  *«**(*, )T,(y,)  (32) 

h  1 


where 

T*(*<)  =  cos(fccos“I(~)) 

are  the  Chebyshev  polynomials  evaluated  at  the  corresponding  collocation  points.  The  potential 
in  equation  (32)  can  be  written  in  matrix  notation  as  [ft]  where  the  elements  4>ij  denote  the  value 
of  the  potential  at  points  and  yy.  Derivatives  at  time  level  n  are  then  calculated  as  above  from 


1*1.  «  PH*] 
I*]..  =  Pl’[*l 
[*l,  =  I*]P1T 
l*U  =  1*]PIT’ 


(33) 


where  the  elements  4>nj ,  tzzij,  4>yij  the  values  of  the  derivatives  of  ^  at  the  points  x,- 

and  yy. 


3.2.3  Computational  Domain 

Stretching  To  apply  spectral  solution  to  an  arbitrary  range  of  the  physical  variables,  it  is  nec¬ 
essary  to  introduce  computational  variables  which  map  the  range  of  the  physical  variables  to  the 
computational  range  of  [-1, 1].  The  physical  variables  x  and  y  are  mapped  linearly  onto  the  com¬ 
putation  variables  £  and  jj  by 

z  =  (*a^«i+ *s±2i 

y  =  r?i^j£i  +  l34' 

where 

*»<*<*! 

V*  <  V  <  V. 


499 


1} 


Multidomain  Solution  The  development  for  multidomain  solution  is  limited  to  rectangular 
subdomains  aligned  with  the  physical  coordinate  axis.  Continuity  of  the  dependent  variable  and  its 
normal  derivative  will  be  imposed  at  the  interface  of  subdomains.  For  the  rectangular  subdomains, 
the  normal  derivative  requirement  amounts  to  letting  be  continuous  at  x  faces  and  be 
continuous  at  y  faces. 

Figure  (1)  shows  the  three  domain  grid  which  is  used  to  solve  equation  (25)  over  a  symmetric 
airfoil  at  sero  angle  of  attack.  The  dependent  variables  in  domains  I,  II  and  III  are  denoted  as 
<frlt  and  $ni  respectively.  The  entire  computational  domain  is  bounded  from  x  =  xq  to  *  =  xj 
where  sudomain  /  lies  between  x  =  xq  and  x  =  x\t  subdomain  II  between  x  =  xi  and  x  =  xj  and 
subdomain  III  between  x  =  xj  x  =  xj. 

Continuity  across  the  subdomains  is  implemented  by  letting 

s 

4>i(*U  Vi)  =  <t>isj  =  a  ]£  Dsktlj  (35) 

4=0 


where 


*"(*!.»,)  =  *%  =  !> 

4=0 

-  4*Nj  ~  P  H  DNk4>[ y 

4=0 

4=0 


a  = 

*i-*o 

B  =  — a— 

H  *a-«i 


(36) 

(37) 

(38) 


Equating  the  above  conditions  according  to  the  interface  conditions  and  letting 


*Nj  =  4B  =  vi  /,Qx 

the  following  equations  can  be  used  to  solve  for  ^  at  the  subdomain  interface  in  terms  of  vy  and 


toy: 


and 


If  1 

PDnqV]  +  \0DffS  -  <*Doo]wj  =  7  2  D°rikr/  -P^L,  DNk4>"j 

4=1  4=1 

Af-1  tf-1 

l«0AW  -  ^f?00]v>  -  PDonWj  =  D0k*k*  Dn*+ky 

4=1  4=0 


(40) 


(41) 


500 


3.3  Solution  Algorithm 

The  following  algorithm  is  used  to  iterate  equation  (31)  on  the  multidomain  grid  until  a  steady 
state  solution  is  obtained: 

1.  Compute  [D]  and  [D]a  matrix 

2.  Compute  derivatives  of  (#]  using  equation  (33)  at  the  previous  time  level 

3.  Impose  boundary  conditions  at  far  field 

4.  Compute  value  of  0  at  the  interface  of  the  subdomains  from  equations  (40)  and  (41) 

5.  Evaluate  <)>  at  level  n  +  1  from  equation  (31) 

6.  Repeat  from  step  (2)  until  convergence  criterion  is  met 

3.4  Results 

The  pseudospectral  method  is  validated  through  solution  to  clasical  partial  differential  equations[9] . 
Resultys  for  the  parabolic  arc  airfoil  are  presented  here.  Figure  (2)  shows  the  computed  pressure 
coefficient  over  a  parabolic  arc  airfoil  at  zero  angle  of  attack  and  a  Mach  number  of  .825.  Results 
are  shown  for  parabolic  arcs  of  thickness  ratio  ranging  from  1  to  6%  in  increments  of  1 %.  The 
calculations  were  done  on  a  25X9  grid  as  shown  in  figure  (2).  The  computations  were  halted  when 
the  coefficient  at  the  center  of  the  airfoil  did  not  change  to  3  decimal  places  or  about  500  iterations. 
For  these  thin  airfoil  cases  at  Mach  number  of  .825  the  flow  is  subcritical  and  hence  no  shocks  are 
observed.  These  results  are  in  excellent  agreement  with  those  of  Sivaneri  and  Harris  [18]  using  a 
much  coarser  grid.  The  CPU  time  required  for  this  calculation  is  2  minutes  on  a  VAX  750. 

The  results  show  that  the  pseudospectral  method  is  a  viable  efficient  technique  to  investigate 
transonic  flows  and  work  is  in  progress  to  further  develop  the  method  for  unsteady  flows. 

4  The  Hodograph  Equation 

4.1  Motivation 

The  main  objective  in  this  case  is  to  create  an  efficient  method  of  predicting  transonic  flow 
over  two-dimensional  bodies  by  combining  the  hodograph  formulation  with  spectral  methods.  As 
a  by-product  we  also  hope  to  develop  an  inverse  design  code  for  shockless  airfoils  in  transonic  flow. 
The  assumptions  are  two-dimensional,  steady,  inviscid  and  isentropic  flow. 

The  advantage  of  using  the  hodograph  formulation  is  that  the  governing  equations  become 
linear  thereby  enabling  solution  by  linear  superposition.  However  this  advantage  is  only  gained 
at  the  expense  of  complicating  the  boundary  conditions  as  well  as  our  sense  of  where  the  body 
is  in  the  hodograph  domain.  Many  different  hodograph  methods  have  been  used  in  the  past  (see 
[13]), the  most  successful  being  that  of  Garabedian,  Korn  and  Bauer  [2]  which  produced  the  shockless 
Korn  airfoil.  The  ability  to  design  shockless  airfoils  is  important  since  a  large  amount  of  the  drag 
in  transonic  flow  is  attributed  to  wave  drag  caused  by  shock  waves.  Elimination  or  weakening  of 
shocks  greatly  reduces  drag  which  is  instrumental  in  improving  the  efficiency  of  the  transonic  cruise 
range  in  which  moot  commercial  aircraft  operate.  The  present  hodograph  formulation  follows  that 
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of  Nieuwland  it  al  [15]  but  uses  numerical  solution  instead  of  analytic  continuation  and  integral 

transform  methods. 

Results  in  the  hodograph  plane  are  translated  back  to  the  physical  plane  through  an  analytical 
transformation  to  obtain  physical  coordinates,  mainly  of  the  body.  Hence,  the  most  useful  outcome 
of  this  work  should  be  an  efficient  inverse  design  code  for  shockless  airfoils  in  transonic  flow.  For 
design  methods,  Lighthill’s  constraints  must  be  incorporated  to  ensure  closure  and  circulation  for 
the  designed  airfoil  [12].  Volpe  and  Melnik  [19]  have  derived  equivalent  constraints  for  transonic 
flow:  a  loose  implementation  of  these  consraints  is  included  in  this  method,  though  it  is  not  truly  an 
inverse  design  method  in  its  present  form.  An  iteration  procedure  is  needed  to  arrive  at  a  realistic 
airfoil  starting  from  a  circular  cylinder.  The  method  is  validated  by  two  test  cases,  a  NACA  0012 
airfoil  in  subsonic  flow  and  a  NLR  QE-.11-. 75-1. 375  airfoil  in  transonic  flow. 


4.2  Formulation 


Governing  Equations  The  governing  equation  for  steady  two-dimensional  potential  flow  in  the 
transonic  range  may  be  transformed  by  a  series  of  independent  variable  transformations  [13]  to  the 
hodograph  plane,  where  the  independent  variables  are  u  and  v  or  q  and  9,  such  that  v  =  ui  +  vj  = 
qe'*.  The  resulting  equation  in  the  hodograph  plane  may  be  stated  in  terms  of  <t>,  the  velocity 
potential  or  its  complex  conjugate  0,  the  streamfunction.  Here  we  choose  to  write  the  equation  in 
terms  of  the  streamfunction  and  the  nondimensional  independent  variables  r  and  9  defined  by 
r  =  (q/qmmmV  and  9  =  arg( v),  as  follows: 


*~  +  7(T^'  +  :Rr=V"  =  0 


(42) 


where  r  €  [0;1[  and  9  €  [0;2x].  qmat  is  given  by  (2/(7  -  l))l/*c0.  This  hodograph  equation  is 
a  linear  variable  coefficient  partial  differential  equation  of  mixed  (elliptic  -  hyperbolic)  type.  The 
transformation  back  to  the  physical  plane  is  given  in  integral  form  by 


.  _  f  e<#  f  f(i-  £r Oav* 

X  +  ,eV  J  ri/*(l  -r)i/U-i)  2r(l  -  r) 
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where  e  is  the  thickness  ratio  and  all  variables  have  been  non-dimensionalized  appropriately.  Given 
a  streamfunction  distribution  ^(r,l)  one  can  calculate  the  flow  in  the  physical  plane  analytically 
by  equation  (43). 


Singularities  Different  types  of  singularities  appear  in  this  formulation.  The  variable  coefficients 
of  equation.  (42)  are  infinite  at  the  limits  of  the  domain,  i.e.  for  r  =  0  and  1.  These  points  are 
regular  singular  points  and  hence  the  solution  to  (42)  is  either  analytic  or  has  a  pole  or  an  algebraic 
or  logarithmic  branch  point.  Branch  type  singularities  are  inherent  in  the  hodograph  formulation 
due  to  the  multi-sheeted  nature  of  the  flow  representation.  Typical  airfoil  flow  hodograph  solutions 
must  be  represented  by  two  or  more  sheets  as  illustrated  in  figure  (3).  The  upper  and  lower 
sheets  are  separated  by  a  branch  cut  emanating  from  a  branch  point  (r*,f*).  Furthermore  the 
streamfunction  ^  has  a  free-etream  singularity  of  the  doublet  type  at  infinity  and  hence  at  (r^O) 
in  the  hodograph  plane.  Among  other  difficulties,  the  entire  physical  flow  at  infinity  is  mapped  to 
a  single  point  in  the  hodograph  plane.  The  Jacobian  of  the  transformation  between  the  hodograph 
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and  physical  planes  exhibits  singularities  for  J  —  0  and  J  =  oo  which  must  be  avoided  to  ensure 
a  one  to  one  correspondence  between  the  physical  and  hodograph  planes.  In  the  present  case  it  is 
found  that  these  singularities  do  not  appear  and  hence  only  the  free-stream  singularity  at  infinity 
and  the  branch  point  separating  the  two  sheets  of  the  hodograph  surface  must  be  represented  in 
the  final  solution. 

Boundary  Conditions  The  transformation  (43)  appears  to  give  a  direct  correspondence  between 
the  physical  and  hodograph  planes.  However,  as  illustrated  by  figure  (3),  the  unknown  functions 
and  particularly  the  body  itself  are  not  easily  identifiable  in  the  hodograph  plane.  In  consequence, 
typical  boundary  conditions  for  external  flow  problems  such  as  the  farfield  boundary  condition,  the 
tangency  conditon  and  the  Kutta  condition  at  the  trailing  edge  of  the  airfoil  cannot  be  stated  in 
terms  of  tp,  q  and  $  in  the  hodograph  plane.  Instead,  the  only  conditions  that  can  be  imposed  are 
that  the  stagnation  points  r  =  0  and  the  point  of  maximum  velocity  r  =  are  on  the  body  profile 
0  =  0  or  0(r  =  0,0) =  0  and  0(r  =  r»,0)  =  0  and,  though  not  really  a  boundary  condition,  ensure 
realistic  solutions  by  transforming  the  solutions  in  the  hodograph  plane  back  to  the  physical  plane 
via  equation  (43)  and  plotting  the  physical  body  coordinates  to  verify  that  the  results  are  physically 
meaningful.  Additional  boundary  conditions  need  to  be  formulated  for  numerical  reasons  as  will 
be  seen  in  the  following  section. 

4.3  Method  of  Solution 

Solution  of  the  hodograph  equation  forms  the  basis  of  the  proposed  method  but  does  not  neces¬ 
sarily  yield  physically  meaningful  results  directly.  The  method  must  therefore  incorporate  solution 
of  the  hodograph  equation  into  an  encompassing  solution  procedure,  the  different  components  of 
which  are  described  individually  in  this  section. 

4.3.1  Solution  of  the  Hodograph  Equation 

Though  equation  (42)  posesses  particular  solutions  involving  Hypergeometric  functions,  these 
are  not  used  in  the  present  method,  since  the  high  frequency  in  9  required  to  determine  the  zeros  of 
the  Chaplygin  particular  solutions  makes  the  solution  of  the  present  problem  (searching  for  0  =  0) 
in  terms  of  these  particular  solutions  impractical.  However,  linear  superposition  of  these  particular 
solutions  has  been  used  by  Lighthill  and  Cherry [5]  to  yield  meaningful  solutions.  Instead,  spectral 
methods  are  employed  in  the  solution  of  equation  (42)  since  these  converge  exponentially  for  smooth 
solutions  and  are  relatively  fast  methods.  Collocation  spectral  methods  are  extremely  well-suited 
to  the  solution  of  variable  coefficient  equations  such  as  the  hodograph  equation.  However,  as 
mentioned  previously,  the  variable  coefficients  of  (42)  are  singular  at  the  boundaries  of  the  domain. 
Since  the  only  singular  boundary  of  interest  is  r  =  0,  solutions  in  two  domains,  namely 

1.  about  r  =  0  or  for  r  €  [0;  ra]  (where  r«  is  close  to  0)  and 

2.  for  r  €  [r«;  r»]  (where  <  1) 
must  be  derived. 


! 
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Solution  About  the  Singular  Point  For  domain  1)  the  point  r  =  0  is  a  regular  singular  point 
and  hence  an  exact  solution  for  the  r  variation  of  ip  in  the  neighbourhood  of  r  =  0  may  be  written 
in  terms  of  a  Frobenius  series  of  the  form 

Y (r)  =  (r  -  r0)a  Y,  °*(r  “  M"  (44) 

n=0 

where  all  coefficients  are  determined  by  substituting  into  equation  (42)  except  a0  which  remains 
arbitrary.  Convergence  of  this  series  is  obtained  for  a  =  +k/2  only,  k  being  the  Fourier  mode  of  the 
solution  in  9.  The  radius  of  convergence  is  r  <  1.  The  solution  is  smooth  and  hence  may  be  easily 
matched  to  the  pseudo-spectral  solution  at  ra.  This  solution  also  displays  the  required  boundary 
condition  V'(r  =  0;  0)  =  0,  expressing  the  fact  that  the  stagnation  points  should  be  on  the  body  a0 
snd  r,  are  part  of  the  set  of  input  parameters  at  each  iteration  step. 

Pseudoepectral  Solution  Solution  of  (42)  in  domain  2),  where  the  variable  coefficients  are 
analytic,  is  obtained  by  application  of  a  pseudoepectral  method.  In  this  method  the  solution 
ip(r,  9)  is  expressed  as  a  truncated  sum  of  analytic  functions,  in  this  case  Chebyshev  polynomials 
in  r  and  Fourier  functions  in  9  (in  contrast  to  the  previous  problem)  as  follows: 

A™.  (45) 

ns*  0  m=0 

where  7*n(r)  are  the  Chebyshev  polynomials  given  in  [7].  ip  is  evaluated  at  the  collocation  points 
as  follows:  ^  ^ 

(46) 

n=0  m=0 

where  for  Chebyshev  collocation  methods  the  collocation  points  are  given  by  r,-  =  cosj}  for  »  =  0,  N 
and  for  Fourier  collocation,  #y  =  for  j  =  0,  M.  These  collocation  points  define  the  grid.  Since 
the  Chebyshev  polynomials  are  defined  from  [-1;  1),  a  mapping  from  the  arbitrary  domain  [ra;  r»] 
to  (-1;  1]  is  in  order.  For  the  Chebyshev  points  in  the  r  direction  the  points  are  crowded  at  the 
edges  of  the  domain,  which  turns  out  in  this  case  to  be  ideal:  high  resolution  near  stagnation  points 
and  the  point  of  maximum  speed.  As  in  the  transonic  small  disturbance  problem  direct  inversion 
is  used.  All  equations  are  hence  written  in  matrix  form, 

i*i = mMW  («) 

where  Ty  =  T<(ry),  Derivatives  are  calculated  as  described  in  section  0. 

The  hodograph  equation  though  steady,  is  restated  in  an  unsteady  formulation 

=  Lip  (48) 

where  L  is  the  hodograph  operator  of  equation  (42).  The  steady  state  solution  of  equation  (48)  is 
sought  so  that  ipt  =  0  and  hence  (48)  reduces  to  the  hodograph  equation  (42).  Time  derivatives 
are  treated  by  finite  difference.  In  this  case  a  semi-implicit  scheme  given  by  Gottlieb  and  Orszag 
[7]  stated  as 

=  hpr»  +  (L  _  £=••),.  (49) 
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is  used.  This  scheme  is  unconditionally  stable.  In  this  scheme,  the  matrix  to  be  inverted  is 
[/]  -  At[LITMS]/2  where  Ln M)  is  the  operator  matrix  made  up  of  [D],[D]*,  ...  with  the  maximum 
values  of  the  variable  coefficients.  This  matrix  is  always  well-behaved  as  opposed  to  the  fully  explicit 
or  implicit  cases  where  the  matrix  is  sometimes  ill-conditioned  and  where  time  steps  are  limited  to 
very  small  values.  It  appears  however  that  At  must  be  somewhat  restr.  ;ted  in  order  that  [/]  not 
be  insignificant  in  front  of  Lm^u^t/2  since  Lmmm  is  large  (due  to  the  large  values  of  the  variable 
coefficients). 

Conditions  and  Constraints  Equation  (49)  is  only  solved  on  the  non-singular  portion  of  the 
domain  [ra;  r»].  Boundary  conditions  for  the  pseudo-spectral  solution  are  given  by 

V'(r»i  9\  t)  =  Km(9)  and  0(r*;  9 ;  t)  =  Kk{9)  (50) 

in  r  and  a  periodic  boundary  condition  in  9.  The  initial  condition  for  the  first  overall  iteration  is 
given  by  an  exact  solution  to  the  hodograph  equation  involving  hypergeometric  functions,  developed 
by  Cherry  [5]  for  transonic  flow  about  a  circular  cylinder.  This  solution  actually  yields  a  slightly 
deformed  circular  cylinder  when  the  above  expressions  are  inserted  into  the  transformation  (43)  as 
illustrated  in  figure  (4).  The  presented  results  were  calculated  by  portions  of  this  method  which  are 
validated  by  the  excellent  agreement  with  Cherry’s  results.  Deformation  of  the  circular  cylinder 
into  an  airfoil  is  ensured  through  the  boundary  conditions  both  at  the  pseudo-spectral  level  and 
at  the  encompassing  iterative  level,  as  well  as  through  the  Frobenius  solution  about  the  singular 
point.  At  this  point  it  is  useful  to  note  that  it  was  decided  in  the  final  form  of  the  method  that 
the  hodograph  equation  be  solved  by  the  pseudo-spectral  method  on  two  domains  (subdomains  of 
domain  2)): 

•  2a)  r  e  (r.jroo) 

•  2b)  r  €  [foojr*] 

since  the  initial  solution  given  by  Cherry  was  so  derived.  (Recall  that  Too  is  the  branch  point).  The 
solutions  for  the  individual  domains  are  matched  numerically  at  rw  Using  a  solution  based  on 
that  of  Cherry’s  ensures  that  the  free-etream  singularity  and  branch  point  behaviour  are  correctly 
included. 

Matching  of  the  singular  solution  and  the  peeudospectral  solution  is  straightforward.  Either  ^ 
or  derivatives  of  V'  can  be  matched  at  ra  and  rM.  This  translates  in  either  case  to  matching  the 
Aij  or  Bij  coefficients.  At  the  other  end  of  the  doraain,  r>,  an  arbitrary  but  realistic  boundary 
condition  must  be  imposed.  We  choose  to  use  the  boundary  condition  at  the  point  of  maximum 
velocity  which  occurs  on  the  body.  Hence  the  maximum  speed  on  the  body  must  be  known  a 
priori  and  the  boundary  condition  is  9)  =  Kt(9)  with  0*)  =  0.  A  judicious  choice  of  the 
function  A»(0)  is  crucial  to  the  quality  of  the  results  for  a  given  case. 

4.3.2  Transformation  to  the  Physical  Plans 

Physical  coordinates  for  constant  streamfunction  values  are  obtained  by  the  transformation 
(43),  the  most  interesting  coordinates  being  those  corresponding  to  ^  =  0  or  the  body  contour 
obtained  by  secant  interpolation.  Upon  substituting  the  Chebyshev-Fourier  form  of  V',  (45),  into 
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(43),  all  integrals  are  readily  calculated  by  hand.  The  complete  expression  for  the  coordinates  z 
and  y  is  given  by 

r>  r>  n  „•_/«» _ / _ m\  r  i«W  j_ 


+ 


+  En  Em  Bnmeo»(0)eot(m9)  / 

-jir^rrEn  Em 

^EmmAnrnWC^p  +  gjgffll). 

The  integrals  in  r  are  all  of  the  same  form  once  the  Chebyshev  functions  are  written  out  in  poly¬ 
nomial  form.  They  can  be  calculated  by  integration  by  parte  and  written  in  terms  of  a  recursion 
relation  (see  [13]).  Hence  the  transformation  back  to  the  physical  plane  is  done  exactly  with  the 
above  analytic  expressions. 


“  L>n  2-m  onmnn\v  )  j  f 

+  E.  s. 

E.  E»  +  SJfegSi) 

«y=  -EnE« 


-  E* 
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4.3.3  Iteration  Scheme 

Though  the  hodograph  equation  is  solved  accurately  by  the  peeudospectral  method,  the  results  it 
yields  are  not  necessarily  physically  meaningful.  This  is  due  to  the  fact  that  these  result#  have  been 
calculated  with  insufficient  boundary  conditions.  Mathematically  these  conditions  are  sufficient  to 
obtain  a  solution;  physically  they  are  not.  Particularly  the  functions  K,(l)  and  if»(0)  are  not 
known  and  must  be  guessed  for  each  case.  Hence  existence,  unique:  m  and  physical  meaning  are 
not  guaranteed.  Furthermore,  since  the  physical  coordinates  for  an  aii'oil  are  derived  from  those 
for  a  circular  cylinder  an  iteration  procedure  is  preferred.  The  iterations  also  servo  the  purpose 
of  being  able  to  monitor  the  solution  as  it  evolves  and  guide  certain  parts  of  it  by  altering  the 
boundary  values  if  needed,  since,  ss  shown  before,  the  prescribed  boundary  conditions  are  not 
sufficient. 

Hence,  an  iteration  scheme  is  set  up  to  obtain  from  Cherry's  solution  to  flow  around  a  circular 
cylinder  that  around  an  airfoil.  The  iteration  operates  on  the  boundary  conditions,  altering  at  each 
step  either  the  value  of  ^  at  a  boundary  or  the  value  of  the  boundary  e.g.  ty  (Recall  that  r»  is  the 
maximum  velocity  on  the  body  which  must  be  changing  with  the  shape  of  the  body  or,  tm  it  is  set 
up  in  this  case,  the  shape  of  the  body  changes  because  we  are  trying  to  keep  the  same  maximum 
speed.)  Most  importantly  the  Frobenius  solution  in  domain  1)  is  altered  in  such  a  way  that  this 
solution  resembles  the  desired  result  only  in  the  region  r  €  [0;  rA].  However,  it  is  not  evident  how 
this  should  be  done. 

The  entire  calculation  procedure  is  summarised  in  the  flowchart  given  in  figure  (5).  The  pressure 
coefficient  Cp  distribution  over  the  body  is  given  by 

c,  “  («> 
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4.4  Result* 


The  validity  of  the  proposed  method  ii  demonstrated  and  discussed  through  results  of  test  cases 
for  realistic  airfoil  shapes.  Typically  one  complete  run  to  design  a  shockless  symmetric  airfoil  in 
transonic  flow  requires  approximately  15  minutes  of  CPU  time  on  a  VAX  750  where  steady  state  is 
reached  in  1000  steps  at  each  iteration  level.  Approximately  three  to  five  encompassing  iterations 
are  needed  to  obtain  a  final  solution  with  the  correct  boundary  conditions,  yielding  an  approximate 
total  of  5000  iterations  for  two  domains. 

4.4.1  NACA  0012  Airfoil 

The  first  test  case  involves  trying  to  reproduce  a  NACA  0012  airfoil.  Since  the  flow  in  this 
method  must  be  shockless  and  a  NACA  0012  develops  a  shock  in  transonic  flow,  results  for  a  NACA 
0012  can  only  be  obtained  in  a  purely  subsonic  flow.  The  input  parameters  for  the  presented  test 
case  were  taken  from  results  of  a  cell-centered  finite  volume  method  using  the  Euler  equations  by 
Allmaras[l].  The  Mach  number  for  this  test  case  was  =  .60,  angle  of  attack  a  =  0.  The 
proposed  method  uses  20  collocation  points  in  each  domain,  hence  40  in  r  and  20  in  #.  This  case 
is  referred  to  as  a  40X20  grid.  All  mar  as’  rssults  wars  run  with  a  96X16  "C"  type  grid.  Typically 
rp  is  interpolated  to  values  of  =  10~4.  Results  for  this  test  case  are  given  in  figures  (6,7).  Only 
the  upper  half  of  the  body  is  plotted  hers  as  the  body  is  symmetrical. 

It  is  found  for  this  run  that  the  maximum  thickness  is  12.3%  as  compared  to  12%  ideally.  The 
error  in  the  maximum  thickness  is  therefore  of  2.5%  with  respect  to  the  ideal  which  is  acceptable 
considering  the  crude  nature  of  setting  the  coefficient  of  the  Frobenius  series.  Upon  comparing  the 
NACA  0012  coordinates  given  by  AGARD[8]  as  shown  in  figure  (6),  it  is  shown  that  agreement  is 
generally  good.  A  closer  look  at  the  body  coordinates  is  given  in  figure  (7).  In  this  view  it  is  seen 
that  the  actual  profile  is  quite  jagged.  This  is  due  to  the  fact  that  il>  is  only  being  interpolated  to 
10~4  and  the  final  values  are  either  positive  or  negative  thereby  oscillating  about  the  correct  ip  —  0 
profile.  A  smoothing  function  or  further  interpolation  may  be  applied  to  obtain  a  smooth  profile 
and  even  better  agreement  with  the  test  case.  Comparison  of  Cp  with  that  of  Allmaras[l]  (fig. (6)) 
brings  forth  one  of  the  advantages  of  using  the  Chebyshev  pseudospectral  method:  at  the  trailing 
edge  where  r  €  [0;  r»]  the  points  are  crowded  due  to  the  Chebyshev  collocation  point  distribution, 
thereby  giving  very  good  resolution.  Though  Allmaraa’  results  are  already  on  a  much  finer  grid,  the 
trailing  edge  drop  in  C9  back  down  to  its  level  at  the  leading  edge  is  not  captured.  Note  also  that 
there  is  no  smoothing  in  the  present  method  as  opposed  to  finite  difference  methods.  Furthermore 
the  spectral  method  yields  coefficients  and  hence  an  analytical  expression  for  thr-  solution  which 
enables  equal  accuracy  in  the  approximation  to  any  flow  variables  at  any  point  in  the  flow,  thereby 
granting  accurate  results  without  the  need  for  very  fins  resolution.  Errors  in  Cf  are  considerably 
larger  than  those  for  the  coordinates  since  the  coordinates  do  not  agree  (and  thus  the  pressure 
distribution  is  not  expected  to  coincide). 

4.4.2  NLR  QE  0.11  -  0.75  -  1.375  Airfoil 

The  NLR  Quasi-Elliptical  0.11  -  0.75  -  1.375  airfoil  was  selected  for  testing  in  the  transonic  range 
since  it  is  a  symmetrical  shock-free  airfoil  designed  by  the  hodograph  method  of  Nieuwland[15] 
for  which  experimental  data  from  AGARD[8]  exist  including  #  values,  which  are  important  in 
hodograph  methods,  particularly  to  verify  the  input  parameters. 


507 


The  present  airfoil  was  designed  for  the  conditions  Mw  —  -78612,  e  =  .1172,  Mm*,  on  = 
1.306,  o  =  0.  These  parameters  were  used  in  the  test  case  of  the  proposed  method.  The  design 
coordinates  and  Cf  distribution  are  given  in  AGARD[8].  Experiments  from  AGARD[8]  give  results 
for  Moo  —  -789.  The  results  for  this  test  ease  are  shown  in  figure  (8)  in  comparison  with  the  design 
conditions  and  the  experimental  results.  Note  that  the  experimental  results  are  at  a  different 
freestream  Mach  number  and  hence  close  agreement  is  not  expected.  Results  are  generally  poor  for 
this  test  case  and  the  body  coordinates  are  scaled  by  4  to  be  shown  in  detail.  Only  eight  points  from 
the  hodograph  solution  are  meaningful.  The  physical  coordinates  are  again  oscillating  about  a  mean 
due  to  the  moderately  accurate  interpolation  values  of  V>,  though  three  points  actually  coincide  with 
the  body  line.  The  C9  values  are  overestimated,  but  follow  the  correct  trend.  It  appears  that  for 
more  complicated  test  cases  the  method  is  not  robust  and  therefore  needs  improvement. 

4.4.3  Discussion 

The  CPU  time  required  for  one  complete  run  (15  minutes  on  a  VAX  750)  is  remarkably  reason¬ 
able  when  compared  with  other  methods.  An  inveres  dssign  method  using  a  finite  volume  method 
on  the  Euler  equations  for  airfoils  in  viscous  transonic  flow  developed  by  Drela[6]  requires  10  min¬ 
utes  per  airfoil.  The  method  of  Boerstoel  and  Huising[3]  requires  25  minutes  on  a  CDC  6600  for  130 
coordinate  points.  The  method  of  Garabedian,  Korn  and  Bauer[2]  is  much  more  efficient  with  an 
approximate  time  of  2.5  minutee  on  a  CDC  6600  computer.  Though  these  figures  are  estimates  for 
average  runs,  perhaps  with  finer  grids  and  on  different  sised  computers,  a  comparison  shows  that 
the  present  method  is  at  least  competitive  if  not  slightly  adavantageous  with/over  other  inverse 
design  methods  as  far  as  time  requirements  are  concerned.  Additional  complications  in  the  shapes 
to  be  designed,  such  as  concerns  with  closure,  angle  of  attack,  etc.  are  expected  to  alter  boundary 
conditions  and  input  parameters  but  not  increass  CPU  time  si?’:*.cantly. 

More  pertinent  to  the  physical  problem  is  the  accuracy  of  the  method  which  does  not  compare 
very  well  with  other  inverse  design  methods.  This,  however,  is  expected  since  the  method  is  net 
really  an  inverse  design  method  yet.  The  main  problems  with  the  method  in  its  present  form 
are  the  prescription  of  the  Frobenius  coefficient,  the  seeming  lack  of  determining  or  boundary 
conditions;  e.g.  Km{$)  and  A»(l)  in  equation  (50)  due  again  to  the  fact  that  we  can’t  prescribe/know 
distributions  over  the  body /solution  domain.  However  this  sparseness  of  set  conditions  is  somewhat 
necessary,  first  to  retain  degrees  of  freedom  in  a  design  method  and  second  to  satisfy  the  uniqueness 
theorem  of  Morawets[14]  for  shockless  flows.  In  the  present  case  this  condition  is  loosely  satisfied 
but  mathematically  there  is  still  no  assurance  that  the  solution  will  be  meaningful  or  unique  in  all 
cases.  Furthermore,  the  iteration  scheme  is  rather  simplistic.  A  more  involved  iteration  scheme 
which  would  enforce  more  stringent  boundary  conditions  at  each  step  should  be  considered.  An 
iteration  scheme  such  as  the  Method  of  Parametric  Differentiation  [17]would  reveal  itself  most 
useful  in  the  more  general  cases  for  lifting  airfoils  with  camber,  angle  of  attack,  etc.  Note  that 
the  hodograph  does  not  extend  to  three  dimensional  flow  and  hence  applications  are  restricted  to 
airfoil  design. 

5  Conclusion 

The  foregoing  work  demonstrates  how  efficiently  spectral  methods  can  be  used  in  solving  tran¬ 
sonic  flow  equations.  Pseudospectral  methods  are  used  in  the  two  given  cases  for  their  suitability 
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to  nonlinear  and  variable  coefficient  equations.  The  first  application  is  solution  of  the  transonic 
small  disturbance  equation  for  thin  airfoils.  A  Chebyshev  pseudospectral  method  is  used  to  solve 
the  nonlinear  equation  using  explicit  forward  Euler  time  iteration  to  steady  state.  A  multidomain 
grid  is  used  to  solve  for  flow  about  an  airfoil,  thereby  requiring  matching  across  domain  inter¬ 
faces.  Excellent  steady  results  for  parabolic  arc  airfoils  in  a  freestream  flow  of  Mach  number  .825 
demonstrate  the  efficiency  and  validity  of  the  pseudospectral  method.  Future  results  with  the  tran¬ 
sonic  small  disturbance  equation  will  include  unsteady  effects  for  airfoils  pitching  and  plunging  in 
transonic  flow. 

The  second  application  involves  solution  of  transonic  flow  through  the  hodograph  formulation.  A 
Chebyshev-Fourier  pseudospectral  method  is  used  to  solve  the  linear  variable  coefficient  hodograph 
equation.  In  this  case  the  overall  formulation  is  more  complicated  due  to  the  singularities  and 
boundary  conditions  in  the  hodograph  plane.  Solution  for  the  flow  and  the  body  coordinates  are 
obtained  at  once,  thereby  requiring  more  detailed  information  than  actually  availbale.  For  this 
reason,  the  presented  results  for  a  NACA  0012  in  subsonic  flow  and  a  NLR  quasi-elliptical  airfoil  in 
transonic  flow  are  not  very  accurate  but  do  offer  certain  advantages.  In  both  cases  one  must  note 
the  efficiency  and  simplicity  of  the  pseudospectral  solution  method.  Very  few  grid  points  on  the 
airfoil  are  needed  for  accuracy  comparable  to  (or  better  than  in  some  areas)  that  of  finite  difference 
and  finite  element  methods. 

The  presented  work  shows  that  spectral  methods  can  be  used  to  predict  compressible  (and 
incompressible)  flow  over  realistic  airfoil  configurations.  In  both  cases  studied,  the  spectral  method 
is  relatively  simple  and  fast  compared  to  existing  technology:  finite  difference,  finite  element  meth¬ 
ods  and  various  hodograph  techniques.  Further  work  in  this  field  should  investigate  and  overcome 
other  typical  difficulties  encountered  in  transonic  flows,  yielding  an  efficient  and  novel  alternative 
to  conventional  computational  techniques  for  transonic  flows. 
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Figure  1:  Pseudospectr&l  Multidomain  Grid 


Figure  2:  Computed  Parabolic  Arc  Airfoil  Pressure  Coefficient  Distribution 
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Figure  4: 
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Figure  7:  Close-Up  Comparison  of  Proposed  Method  Solution  with  NACA  0012  Coodinates 

«MOI  -  0.11  -  71  -  1J7* 


Figure  8:  Comparison  of  the  Proposed  Method  Solution  with  NLR  Design  Conditions  and  Experi¬ 
mental  Results  for  the  NLR  QE  -  0.11  •  0.75  -  1.375  Airfoil 
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A  TOOLKIT  OF  SYMBOL  MANIPULATION  PROGRAMS 
FOR  VARIATIONAL  GRID  GENERATION! 


Stanly  Steinberg 
University  of  New  Mexico 
Albuquerque,  NM  87131 

Patrick  J.  Roach  e 
Ecodynamics  Research  Associates 
Albuquerque,  NM  87198 

Abstract  This  paper  describes  some  of  the  mathematical  and  programming  ideas 
involved  in  the  the  creation  of  a  toolkit  of  symbol  manipulation  programs  which  the 
authors  have  used  to  write  a  finite-difference  elliptic  partial  differential  equations  solver. 

I.  Introduction  The  toolkit  of  MACSYMA  symbol  manipulation  programs  was 
developed  in  order  to  use  symbol  manipulation  technology  to  write  FORTRAN  code.  A 
brief  description  of  the  problems  solved  by  the  FORTRAN  code  is  given  in  the  Com¬ 
ments  section  of  this  paper;  extensive  descriptions  are  given  in  the  cited  literature.  The 
current  toolkit  is  based  on  a  previous  symbol  manipulation  project4,0'10.  In  the  course  of 
this,  as  well  as  the  current  project,  two  problems  were  encountered:  the  need  for  more 
memory  than  was  available,  and  excessive  use  of  computer  time.  Both  of  these  prob¬ 
lems  could  be  overcome  by  the  appropriate  combination  of  certain  mathematical  and 
programming  ideas;  these  ideas  are  be  described  here. 

The  general  plan  for  the  development  of  the  toolkit  was  to  organize  the  underlying 
mathematics  in  a  way  that  was  well-suited  to  symbol  manipulation  programming.  At 
the  same  time,  the  overall  structure  of  the  FORTRAN  code  was  taken  into  considera¬ 
tion.  The  toolkit  is  used  to  write  subroutines  that  are  incorporated  into  the  FORTRAN 
code.  The  functions  in  the  toolkit  can  be  thought  of  as  straightforward  implementa¬ 
tions  of  the  steps  that  a  human  programmer  would  use  to  write  the  required  subrou¬ 
tines. 

The  memory  and  time  problems  all  had  a  similar  form  and  solution.  Straightfor¬ 
ward  programming  of  the  mathematics  led  to  the  creation  of  very  large  expressions 
which  could  be  avoided,  or  to  unnecessary  computations.  The  solution  was  to  reorgan¬ 
ize  the  mathematics  and  the  program  to  avoid  the  larp  expression  or  unneeded  compu¬ 
tations.  In  the  previous  symbol  manipulation  project10,  the  run  time  of  one  of  our  sym¬ 
bol  manipulation  programs  was  reduced  from  60  epu  hours  to  8  epu  minutes  using  the 
ideas  described  here.  In  the  current  project12,  attempts  to  write  one  of  the  needed  sub¬ 
routines  at  first  caused  the  symbol  manipulator  to  run  out  of  memory  after  computing 
for  several  epu  hours;  this  subroutine  can  now  be  written  in  just  over  one  epu  hour. 

f  Work  supported  by  the  U.S.  Air  Force  Office  of  Scientific  Research,  the  U.S.  Army 
Research  Office,  and  the  U.S.  Office  of  Navel  Research.  Also  presented  at  the  AIAA 
24th  Aerospace  Sciences  Meeting  in  January  of  1986  at  Reno,  Nevada. 
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The  problems  encountered  during  the  development  of  the  toolkit  are  likely  to  occur 
in  other  areas.  Consequently,  the  problems  and  their  solutions  have  been  reformulated 
in  general  mathematical  terms.  It  is  expected  that  this  will  promote  a  wider  understand* 
ing  of  the  techniques  discussed  here. 

It  is  hoped  that  the  reader  is  familiar  with  MACSYMA8.  However,  MACSYMA 
input  and  output  is  rather  natural,  so  those  not  familiar  may  still  be  able  to  follow  the 
discussion.  MACSYMA  is  an  interactive  symbol  manipulator;  thus  it  prints  a  prompt  of 
the  form  (c_l),  (c_2)  ,  *  *  *  .  The  user  then  types  a  command  to  be  executed  (indicated 
by  italics).  The  commands  must  be  terminated  by  a  or  a  the  latter  suppresses 
MACSYMA’s  response.  Usually  MACSYMA  responds  with  a  line  that  is  labeled  with 
(d_l),  (d_2),  •  •  •  or  (e_l),  (e_2),  •  •  •  (responses  are  printed  as  displayed  equations). 
The  symbol  %  stands  for  the  previous  “d_”  expression.  The  operator  **:”  is  used  for 
value  assingment,  the  operator  is  used  in  function  and  array  definitions,  and  “=” 
is  used  to  describe  an  equality.  The  line  labeled  with  “Time"  is  MACSYMA’s  estimate 
of  the  epu  time  used  to  do  the  computation  specified  in  the  previous  “c_”  line.  The 
MACSYMA  sessions  have  been  edited  to  conserve  space. 


II.  Distances  The  functions  in  the  toolkit  compute  the  derivatives  of  quantities  that 
are  defined  in  terms  of  the  Euclidean  distance.  If  such  calculations  are  not  handled 
properly,  they  will  cause  MACSYMA  to  run  out  of  memory  or  use  unreasonable 
amounts  of  epu  time  .  The  two  dimensional  Euclidean  distance, 

<((*,»)  =V?+7,  a) 


can  be  used  to  illustrate  this  problem.  The  z  derivative  of  the  distance  is  given  by 


di.  z 

^“7777 


(2) 


It  is  this  form  of  the  derivative  that  creates  difficulty;  a  better  form  is 


di.  v  x 
dx  *'9  ~  d(z,y) 


(3) 


Note  that  in  this  formula  the  definition  of  the  distance  function  is  not  needed;  the 
derivatives  are  given  implicitly.  If  z  and  y  depend  on  a  parameter,  the  chain  rule  must 
be  used  to  compute  the  derivatives  of  the  distance  with  respect  to  the  parameter. 

The  implicit  form  of  the  derivative  can  be  implemented  in  MACSYMA  using  the 
grade/  command.  This  command  specifies  the  gradient  of  a  function  even  though  the 
function  itself  is  undefined.  The  following  demonstration  will  make  this  point  clearer. 
In  the  calculations,  the  arguments  of  the  distance,  z  and  y,  will  depend  on  the  parame¬ 
ter  t.  The  demonstration  uses  the  following  MACSYMA  commands:  dedifferentiates, 
ev  evaluates  an  expression  using  all  of  the  information  already  in  MACSYMA  and  the 
information  in  the  remaining  arguments,  rattimp  performs  a  rational  simplification. 
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(c_l)  gradef(d(x,y),x/d(x,y),y/d(x,y)$ 
(c_2)  ' diff(d(x(t),y(t)),t ); 


j;d(x(t)Mt)) 


(d_2) 


(c_3)  tv( %,  diff); 

Time=  200  msec. 

»<«)■£»(«)  *<0-£*<<> 

<(*(<).»(«))  +  <K*(0.»(0)  ,<L3 

(c_4)  ratsimp(cv(%,d(x,y)  :=  »qrt(x*  2+  y‘  £))); 

Time=  716  msec. 

+  *(<)-■£*(  0 

\/(»(<)s+*(0s)  <0 

The  next  part  of  the  computation  illustrates  the  fact  that  use  of  the  grade f  function 
may  entail  the  user  performing  simplifications  that  MACSYMA  will  not  know  because  it 
was  not  given  the  definition  of  the  distance  function.  Thus,  even  though  the  definition 
of  the  distance  should  not  be  used  in  the  computations,  the  distance  squared  is  simple, 
and  this  fact  should  be  used.  The  MACSYMA  command  subst  performs  a  substitution. 


(cj»)  ’diff(d(x,y),x,2); 


(c_8)  ev( %,  diff); 
Time—  200  msec. 


(c_7)  ratsimp(%); 
Time=  166  msec. 


d(x}y)  d(x,yf 


*i*,  v)2-*2 


(«) 


(O) 


(<L7) 


(c_8)  $ub$t(x“  2-4-  y‘  2,d(x,y)‘  2,  %); 
Time=  66  msec. 
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2 


d(x,yf 


(<L«) 


The  use  of  ratsubst  instead  of  subst  in  (c_8)  would  replace  ds  by  {x2  +  y2)  d. 

It  may  not  be  clear  to  the  reader  that  there  is  any  great  advantage  to  using  the  gra¬ 
de f  function.  It  is  important  to  imagine  a  situation  where  the  expressions  being  manipu¬ 
lated  contain  hundreds  or  thousands  of  terms;  then  slight  variations  in  the  method  of 
writing  the  expressions  can  make  a  great  difference  in  the  size  of  the  expressions  and 
thus  effect  how  fast  MACSYMA  can  manipulate  the  expressions.  Some  of  our  computa¬ 
tions  involved  expressions  that  contained  several  hundred  instances  of  the  three- 
dimensional  distance  d(x,y,z)  with  x ,  y,  and  z  themselves  replaced  by  derivatives.  To 
directly  numerically  evaluate  such  an  expression,  where  the  square  root  function  would 
be  called  several  hundred  times,  is  inefficient.  One  must  at  least  replace  all  of  square 
roots  with  a  single  variable  and  then  evaluate  the  single  variable  once  before  the  large 
expression  is  evaluated.  Using  grade f  makes  this  easy.  In  addition,  if  a  large  expression 
contains  d(x,y)  rather  than  the  definition  of  rf(ar,y )  in  terms  of  a  square  root, 
MACSYMA  can  manipulate  the  expression  faster  because  d[x,y)  is  manipulated  like  a 
simple  symbol  in  computations  other  than  differentiation. 


What  is  the  best  way  to  evaluate  expressions  that  contain  d?  The  difficulties  can  be 
illustrated  using  the  simple  polynomial, 


E  <'• .  H) 

;-o 

for  small  values  of  n.  The  point  of  interest  here  is  that  the  value  of  d2  is  known  before 
the  value  of  d  is  known.  One  way  to  take  advantage  of  this  is  to  write  the  polynomial 
(when  n  =  4)  as 

exp  =  l  +  d  +  d2+d  X  d2  +  (d2)2.  (5) 

This  form  of  exp  can  be  be  computed  in  MACSYMA  using  the  ratsubst  (rational  substi¬ 
tution)  function. 

(c_l)  exp  :  sumfd'  i,i,0,4); 

Time=  500  msec. 


l+d  +  d2+d3+d4  (d_I) 

(c_2)  exp  :  ratsubst(d2,d‘  2,  exp); 

Time=  66  msec. 

1  +  d2+  d(l  +  d2)+d22  (d_2) 

If  the  simplicity  of  d2  is  ignored,  what  can  MACSYMA  do  with  such  an  expression? 
This  question  is  the  same  as  asking  what  MACSYMA  can  do  with  general  expressions. 
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MACSYMA  has  two  useful  functions,  homer  and  optimize,  which  can  be  used  to  minim¬ 
ize  operation  counts  in  expressions.  The  following  MACSYMA  run  illustrates  the  use  of 
these  functions.  First  consider  Horner’s  rule: 

(c_l)  eqn  :  d  =  eqrtfx *  2+  y'  2); 

d  =  N/(7Ty5j  (d_l) 

(c_2)  exp  :  svm(d‘  i,i,0,4); 

Time=  533  msec. 

l  +  d  +  d2+d*+d4  (d_2) 

(c_3)  exp  :  horner(exp); 

Time=  50  msec. 

l+d(l  +  d(l  +  d(l  +  d)))  (d_3) 

If  the  value  of  d  is  substituted  into  the  previous  expression  and  the  expression  is 
expanded,  the  result  will  be  a  rather  large  expression.  What  can  optimize  do  to  recover 
the  simplicity  of  the  original  expression? 

(c_4)  exp  :  ev(expand(exp),eqn); 

Time=  550  msec. 

l  +  x2+x4+  y2  +  2x2y2  +  y4 
+  V(x2  +  y2)  +  x2V(x2  +  y2)  +  y2V(x2  +  y2) 

(c_5)  optimize(exp); 

Time=  3200  msec. 

block  {(a,6,c],a  :  x2,b  :  y2,c  :  v/(a  +  6), 

l  +  a+*4+6+2a6  +  y4+e  +  <ie  +  6c) 

The  last  expression  is  a  MACSYMA  block  that  will  return  the  same  value  as  exp.  Unfor¬ 
tunately,  the  optimize  function  did  not  do  quite  what  was  expected;  it  is  preferable  that 
x*  be  replaced  by  a2  and  y4  by  b2.  Also,  the  example  is  bit  unfair  in  that  the  optimize 
function  was  given  the  polynomial  in  the  worst  form  that  could  be  found.  However,  this 
does  illustrate  the  point  that  it  is  a  good  idea  to  write  expressions  so  that  common 
subexpressions  are  easily  reconized.  Note  that  the  computation  time  for  the  optimize 
functions  was  large  compared  to  the  other  computations. 

The  operation  counts  for  the  various  forms  of  exp  are  (assuming  that  the  4-th  powers 
are  fixed  up): 


(4.4) 


(d-5) 
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X 


+ 

(5)  5  4 

(d_3)  5  5 

(d_5)  8  8 

Clearly  there  are  a  great  many  ways  of  writing  expressions  that  involve  distance 

functions.  If  the  expression  is  rational  rather  than  just  a  polynomial,  and  contains  the 
distance  function  with  several  different  arguments,  the  situation  is  even  more  difficult. 
In  the  toolkit,  expressions  are  written  in  terms  of  the  distance  function  (i.e.  not  using 
square  roots)  and  then  a  variant  of  Horner's  rule  is  applied  to  the  expression. 

ID.  Determinants  and  Matrices  In  the  FORTRAN  code  generation  computations 
there  are  many  two-by-two  and  three-by-three  matrices  whose  entries  depend  on  several 
parameters.  The  functions  in  the  toolkit  differentiate  the  matrices,  powers  and  inverses 
of  them,  determinants,  and  powers  of  the  determinants  with  respect  to  each  of  several 
parameters.  In  the  following  computation,  the  inverse  of  the  determinant  of  a  three- 
by-three  matrix  is  computed  explicitly.  The  result  is  then  differentiated  with  respect  to 
one  of  the  entries  in  the  determinant,  and  “simplified”.  In  large  computations,  such 
simplifications  are  essential;  here  it  causes  problems  analogous  to  those  met  in  larger  cal¬ 
culations.  The  denominator  of  the  resulting  large  expression  is  the  square  of  the  deter¬ 
minant,  a  fact  that  can  be  discovered  using  the  MACSYMA  factor  function.  Note  that 
the  factoring  uses  more  cpu  time  than  any  other  part  of  the  ca’culation.  The  remvalue 
function  removes  the  value  previously  assinged  to  a  variable. 


(c_l)  m:matrix(fa,  b,  cj,(r, a, tj, [x, y, zj); 

■  ■ 
a  b  c 

rat 
*  ¥  z. 

(c_2)  exp  :  ’diff(DELTA‘ (-l),x); 

_£J_ 

dx  A 


(c_3)  DELTA  :  expand(deUrminant(m)); 
Time=  283  msec. 


aaz-brz-aty  +  cry  +  btx-csx 

(c_4)  eombinc(expand(cv(exp,  diff))); 

Time=  1650  msec. 


(<U) 


(d-2) 


(d_3) 
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e  a  -bt  /(a2a2z2  (d_4) 

-2abr»  z2+  b2r2z2-2a2a  tyz  +  2a  b  r  t  y  z 
+  2a  t  rs  y  z  -2b  e  r2yz  +  2a  b  a  t  x  z -2b2 r  t  z  z 
-2  a  e  a2xz  +  26  eraxz  +  a2t2y2-2a  erty 2 
+  e2r2y2 -2a  b  t2xy  +  2n  ea  txy  +2b  c  rt  xy 
-2t2rax  y  +  b2t2x2-2be  a t  x2+  e2a2x 2) 

(c _ 5)  faetor(denom( %)); 

Time=  13483  msec. 

(aaz-brz-aty  +  cry  +  btx-cax)2  (d_5) 


(c_6)  remvaltie(DELTA) $ 

The  next  part  of  the  calculation  shows  how  to  organize  the  above  calculation  so  that 
the  large  denominator  is  avoided.  The  calculation  is  based  on  a  well-known  formula  for 
the  derivative  of  a  determinant.  Let  m  be  a  square  matrix,  m  be  the  cofactor  matrix  of 
m ,  and  A  be  the  determinant  of  m .  The  element  rh,y  of  the  cofactor  matrix  is  given  by 
(-1)'+'  times  the  determinant  of  the  matrix  obtained  by  deleting  the  t-th  row  and  /- th 
column  from  m.  If  superscript  t  stands  for  transpose,  then  m  X  m*  =  A/  where  /  is 
the  identity  matrix;  that  is,  m* /A  is  the  inverse  matrix  of  m.  The  cofactor  expansion 
of  the  determinant  about  the  t-th  row  of  m  gives 

A  =  .  (1) 

i 


Differentiating  this  with  respect  to  m{]  gives 


dA 

dmij 


(2) 


Let  the  function  dctr  stand  for  the  determinant  of  a  three-by- three  matrix  and  the 
function  mint  stand  for  the  determinant  of  a  two-by-two  matrix.  The  arguments  of  the 
function  are  the  successive  rows  of  the  underlying  matrix.  The  following  MACSYMA 
computation  illustrates  this  idea. 


(c_7)  gradef(detr(a,  b,  e,  r,  s,  t,  x,  y,  z), 

+  minr(t,t,y,z),  •minr(r,t,x,z),  +  minr(r,»,x,  y), 
-mwr(b,c,y,z),  +  minr(a,e,x,z),  - minr(a,b,x,x ), 

+  tninr(b,c,»,t),  - minr(a,c,r,t ),  +  minr(a,b,r,s))$ 
(c_8)  gradtf(minr(a,  b,  r,  «),  +  a,  -r,  -6,  +  ajb 
(c_9)  exp  :  'difflDEL TA‘(-1  ),a); 

Time=  16  msec. 
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da  A 

(c_10)  DELTA  :  detr(a,  b,  c,  r,  »,  t,  x,  y,  zjb 
(c_ll)  ev(exp,  diff); 

Time=  150  msec. 


_ minr(s,ttp,z) 

detr(a,b,c,r,»,t,x,i,zf 


(<U>) 


(d-H) 


The  reason  for  the  simplicity  is  that  the  differentiation  is  done  using  grade f  before  the 
value  of  the  determinant  is  specified. 

The  problem  of  computing  derivatives  of  the  inverse  of  a  matrix  (rather  than  the 
inverse  of  the  determinant  of  the  matrix)  presents  a  similar  challenge.  One  way  to 
proceed  is  to  compute  the  inverse  of  the  matrix  and  then  differentiate.  Note  that  this 
formula  contains  the  inverse  of  the  determinant,  so  the  difficulty  described  above  causes 
even  more  serious  problems.  A  better  way  to  proceed  is  to  use  the  known  formula 

M_  1  8A  l 

dt  A  A  dt  A'  U 

This  allows  the  computation  of  the  derivative  before  information  about  the  inverse  is 
used.  This  formula  is  used  in  a  critical  way  by  Steinberg  and  Roache10  (see  Formula  2.8, 
page  257). 


IV.  Finite  Differences  The  toolkit  must  convert  partial  differential  equations  to  finite 
difference  form.  This  process  can  be  illustrated  using  the  elementary  ordinary  differential 
equation 


an"  bu1  +  eu  —  0  (1) 

where  '  stands  for  differentiation  and  a,  b,  and  c  are  constants.  If  simple  second  order 
centered  differences  are  used,  the  finite-difference  form  of  the  above  equation  is 

Ri  «.+i  +  Ci «,  +  Li  «,-i  =  0  (2) 

where  R ,  C,  and  L  are,  respectively,  the  right,  center,  and  left  coefficients  of  the  stench 
for  the  difference  equation.  The  toolkit  is  used  to  calculate  the  formulas  for  the 
coefficients  of  the  stencil. 

The  simplest  way  to  do  this  in  MACSYMA  is  to  substitute  finite  differences  for  the 
derivatives,  expand  the  result,  and  then  collect  the  coefficients  of  the  differences.  The 
toolkit13  contains  a  function,  difference,  that  returns  centered  differences,  which  is  used 
here. 

(c_I)  ode  :  a*diff(u(t),t,2)+  b*diff(v(t),t)+  c*v(t)=0; 
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i 


(<U) 


a~di*u{t)+ 


(c_2)  d(0) :  difference(u,l); 


«,• 

(c_3)  d[l] :  difference(u,l,l); 

«.  +  !-«, -l 

2A 

(c_4)  d(2j :  diffcrcnccfu, 1,1,1); 

«.  +  i-2«i+“.-i 
A2 


(c_5)  deq  :  ode I 

(c_6)  for  ii:2  step  -i  /Am  0  do 

deq :  sub$t(d[iij,dif[(u(t),t,ii),  deq)% 
Time=  283  msec. 

(c_7)  deq :  expand(deq); 

Time=  383  msec. 


K  +  i  +  l  2a«. 

2A  +  A2  A2 


A«,_i 

+c'“~—+ 


A2 


0 


(c_8)  stencil :  coefffdeq,  u[i+  lj); 
Time=  66  msec. 


_6_ 

2A 


(d-2) 


(d_3) 


(d_4) 


(<L7) 


(d_8) 


The  reason  for  substituting  for  the  higher  derivatives  first  in  line  (c_6|  can  be  found  by 
reading  the  discussion  of  the  derivsubst  flag  in  the  MACSYMA  Manual*. 

In  the  above  computation,  the  substitution  was  performed  before  the  coefficients 
were  taken.  The  order  of  these  computations  can  be  interchanged 


(c_9)  stencil :  sum( 

ratcoeff(d[ii],v[i+  l])*coeff(ode,  diff(u(t),t,ii)) 
, it, 0,2); 

Time=  866  msec. 
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_b_ 
2  h 


(<U» 


To  get  the  correct  coefficient  in  an  expression  in  MACSYMA,  the  expression  must  be 
expanded  or  the  command  rateoeff  must  be  used. 

For  such  a  simple  problem,  the  two  methods  are  equivalent.  However,  as  the  size  of 
the  differential  equation  grows  and  the  space  dimension  increases,  the  second  method  is 
substantially  more  efficient  because  it  eliminates  unnecessary  algebra.  In  the  comments 
section  of  Steinberg  and  Roache10,  it  is  pointed  out  that  the  Laplacian  in  general  coordi¬ 
nates  in  three  dimensions  (in  fully  expanded  form)  contains  1G11  terms.  For  problems 
of  this  size,  the  first  approach  discussed  in  this  section  produced  symbol  code  that  ran 
for  over  60  cpu  hours' .  A  combination  of  the  ideas  discussed  in  this  paper  reduced  this 
runtime  to  8  cpu  minutes. 

V.  Translating  to  FORTRAN  In  the  FORTRAN  codes  it  is  necessary  to  numeri¬ 
cally  evaluate  many  expression  that  are  defined  in  terms  of  derivatives  of  more  funda¬ 
mental  quantities.  In  MACSYMA,  these  quantities  are  represented  in  terms  of  deriva¬ 
tives,  while  in  the  FORTRAN  code  the  quantities  are  represented  in  terms  of  finite 
differences.  Such  quantities  are  evaluated  by  first  approximating  the  derivatives  of  the 
fundamental  quantity  using  finite  differences  and  then  representing  the  dependent  quan¬ 
tities  in  terms  of  the  finite  differences  of  the  fundamental  quantities.  This  can  be  illus¬ 
trated  by  computing  the  Jacobian  of  a  change  of  variables  in  three  dimensions. 

Before  the  computation  is  started,  the  effect  of  the  simp  flag  needs  to  be  understood. 


(c_l)  simp  :  false!$ 

(c_2)  1*3+0; 

1-3  +  0  (d_2) 

(c_3)  %,  simp; 

3  (d_3) 

Setting  the  simp  flag  to  false  inhibits  all  simplification;  in  the  line  (c_3)  the  expression  is 
evaluated  with  this  flag  set  to  true. 

Also,  the  FORTRAN  formulas  for  the  finite  differences  are  easy  to  generate  using  the 
difference  function  defined  in  the  toolkit'*  and  the  fortran  function  provided  in 
MACSYMA. 


(c_l)  fortran(concat(x,l )  —  difference(x,3,l)); 
xl  =  (x(i+l,j,k)-x(i-l,j,k))*(2*hl)**(-l) 
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Time=  600  msec. 


The  symbol  x\  stands  for  the  derivative  of  with  respect  to  its  first  argument. 

Here  is  the  computation  of  the  Jacobian  of  a  transformation  where  (,  ij,  and  f  are 
the  old  variables  and  x,  y,  and  z  are  the  new  variables: 

(c _ 1)  v  :  l x(xi ,  eta,  ztta),  y(xi,  eta,  zeta), 

z(xi,  eta,  zeta)J$ 

(c_2)  fin  :  [xi,  eta,  zetafi 
(c_3)  m[i,j]  :=  diff(v[ij,  nu[jj); 

(c_4)  m :  gcnmatrix(m,8,S,l,l); 

Time—  366  msec. 

■  ■ 

-j^yit’iJ)  (d-4) 

-jjzit’V*) 

(c_5)  jaeobi :  expand(determinantfm)); 

Time=  466  msec. 

•—*(£»?,?)  (O) 

(c_6)  vara  :  (x,  y,  zfi 
Time=  33  msec. 

(c_7)  for  i  thru  8  do  for  j  thru  8  do  ( 
tempi  :  concat(var8[iJ,j), 
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i  emp2  :  difflvfi],  nv[j]), 

simp :  false, 

jaeobi  :  substftempl,  temp2,  jaeobi), 

simp  :  true, 

enddoft 

Time=  983  msec. 

(c_8)  simp :  falst$ 

(c_9)  jaeobi; 

x3ylz2  +  {-l)xly3z2  +  (-l)x3y2zl  (d_9) 

+  x2y3zl  +  xly2z3  +  (-l)x2ylz3 

(c_10)  simp  :  true% 

(c_ll)  jaeobi; 

-  x3y2zl  +  x2y3zl  +  x3ylz2  (d_ll) 

-  x\y3z2-  x2ylz3  +  xly2z3 

The  substitutions  performed  in  line  (c_7)  replace  derivatives  by  simple  symbols.  Con¬ 
sequently,  the  resulting  expression  cannot  be  simplified;  it  can  only  be  written  more 
compactly.  However,  the  MACSYMA  simplifier  (see  Section  3.3  of  the  MACSYMA 
Manual9)  will  normally  try  to  simplify  the  expression  because  MACSYMA  does  not 
know  that  no  simplifications  are  possible.  During  the  simplification  attempt,  the  terms 
of  the  expression  will  be  put  into  a  canonical  order.  Because  parts  of  the  expression  have 
been  renamed,  this  can  take  considerable  time.  The  simp  flag  is  used  to  prevent  the 
irrelevant  reorderings. 

VI.  A  Multivariate  Horner’s  Rule  After  using  all  of  the  above  techniques  in  the 
FORTRAN  code  generation  problems,  many  expressions  are  still  present  which  are  large 
sparse  multivariate  polynomials.  The  toolkit  contains  a  function  called  homers  that 
chooses  one  of  the  variables  in  the  polynomial  and  then  uses  the  univariate  MACSYMA 
homer  function  to  write  the  polynomial  in  Horner’s  form.  The  coefficients  of  the  result¬ 
ing  expression,  which  are  labeled  with  vt  1,  vt  2,  •  •  •  ,  are  again  multivariate  polynomials, 
with  one  less  variable.  The  homers  function  is  recursively  applied  to  the  coefficients  of 
the  resulting  polynomials  until  the  coefficients  contain  at  most  two  terms. 

The  effects  of  this  multivariate  Horner’s  rule  can  be  seen  by  applying  homers  to  the 
square  of  the  determinant  of  the  three-by-three  matrix  from  Section  HI.  The  homers 
function  returns  a  list  of  formulas,  so  the  MACSYMA  function  Idisp  is  used  to  display 
the  formulas  nicely. 

(c_l)  m  :  matrixf (a,b,cj,(r,s,  t],[x,  y,  zj)% 

(c_2)  exp  :  vjS=expand(determinant(m)‘  2); 

Time=  1250  msec. 
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vj  2=  (d_2) 

e2a2x2-2 be  a t  z2+  b2t2x2-2c2ra  xy 
+  2bcrtxy  +  2acatzy-2abt2xy+c2r2y 2 
-2a  c  r  t  y2  +  a2t2y2  +  2b  c  r a  x z -2a  c  a2x z 
-2b2r  t  x  z  +2a  b  at  zz -2b  e  r2yz+2a  crayz 
+  2a  b  r  t  y  z -2a2  a  t  y  z  +  b2r2z2-2a  braz 2 
+  aW 

(c_3)  result :  horncrs([czp]fi 
Time=  175450  msec. 

(c_4)  for  i  thru  lengthfreault )  do  ldisp(reault[ij); 


vf3  =  «(«  x2-2rxy)-\-r2y2 

(e_4) 

vtl4  =  2aty  +  2brz 

(e_5) 

vt  12  =  x (t/f  14- 2  b  t  x)  +  2 a  r  y  z 

(e-6) 

vt\Z  —  y(2btx-2aty) 

(e_7) 

vt  11  =  r  (vt  13 - 2  6  r  y  z) 

(e_8) 

vt2  =  vf  11  +  *  ( vt  12  - 2 a  a  x  z) 

(e.9) 

vt6=  t(tz2-2r  zz)+  r2z2 

(e_10) 

vt  10  =  2«z  +  2ry 

(e_ll) 

vtQ=  vt  10  z 

(02) 

vt8  =  t(vtQ-2tzy)-2raz2 

(03) 

vt5  =  a  vt 8 

(04) 

vt7  =  t(ty2-2a  yz)  +  a2z2 

(05) 

vt4=o2vt7 

(08) 
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vt  1  =  vt4  +  6  {vtb  +  b  vtd) 
vj2  =  vtl  +  c  ( vt2  +  c  vtZ) 


(e-17) 

(08) 

Note  the  large  amount  of  time  required  to  do  the  Horner’s  calculation  on  this  rela¬ 
tively  simple  expression.  In  addition,  there  are  still  a  significant  number  of  common 
subexpressions  left  in  the  resulting  formulas.  Note,  for  instance  that  r2  occurs  in  two 
different  places.  The  problem  of  finding  more  sophisticated  ways  of  evaluating  such 
expressions  is  of  continuing  interest. 

VII.  Large  Coding  Projects  The  toolkit  is  reasonably  large;  the  MACSYMA  source 
code  requires  about  79  kilobyte  of  storage  and  consists  of  44  MACSYMA  functions.  Sec¬ 
tion  10.9  of  the  MACSYMA  Manual9  is  called  “Hints  for  Writers  of  Packages  in 
MACSYMA”  and  contains  several  helpful  ideas  for  managing  projects  of  this  size.  In 
particular,  the  autoloading  feature  of  MACSYMA,  which  is  provided  by  the  function 
setup^avtoload,  is  very  useful.  This  function  allows  function  definitions  to  be  loaded 
automatically  when  they  are  needed. 

Suppose  the  current  directory  contains  a  file  called  f.mac  that  contains  the 
definition  of  the  function  f(x ): 

f(x)  :=  x‘3$ 

The  following  MACSYMA  output  illustrates  how  to  autoload  the  definition  of  the 
function  /. 

(c-i)  !(s); 


/  (3) 


(<Ll) 


(c_2)  8etup_autoload(f,f)$ 
(c_3)  J(S); 

Batching  the  file  f.mac 
Batching  done. 


27  (d_3) 

Notice  that  the  file  f.mac  was  batched.  The  batching  process  is  rather  slow  for  long 
function  definitions.  Loading  can  be  speeded  up  by  using  a  LISP  version  or  an  object 
version  of  the  function  definition.  If  the  definition  of  the  function  /  is  currently  in 
MACSYMA,  then  the  command 

(c _ 1)  aave(nf.r,fj% 

will  create  a  file  with  the  name  f.l  that  contains  the  LISP  definition  of  the  function  / : 
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(defprop  $f  f  autoload) 

(add21nc  ’$f  {props) 

(mdefprop  $f  ((lambda  nil) 

((mlist)  $x)  ((mexpt)  {x  3))  mexpr) 

(add21nc  ’(({f)  $x)  {functions) 

The  LISP  form  of  the  function  can  be  converted  to  object  code  using  the  LISP  compiler 
LISZT;  the  object  code  will  normally  be  put  in  the  file  named  f.o. 

The  object  form  of  the  function  definition  will  load  faster  than  the  LISP  form  of  a 
function  definition,  which  will,  in  turn,  load  considerably  faster  than  the  MACSYMA 
form.  The  MACSYMA  load  function  can  load  any  of  these  files;  it  first  looks  for  f.o, 
then  for  f.l,  and  finally  for  f.m&c. 

Vin.  Comments  The  toolkit  of  MACSYMA  symbol  manipulation  programs  was 
developed  so  that  it  could  be  used  to  write  FORTRAN  subroutines  which  are  used  to 
solve  boundary-value  problems  for  partial  differential  equations.  The  boundary-value 
problem  solver  has  been  used  to  model  lasers5’®  and  other  physical  devices.  The  solver 
uses  finite  difference  techniques6,8  to  solve  the  boundary  value  problem.  The  problems 
solved  are  interesting  because  they  are  posed  in  irregular  regions  and  consequently  the 
solver  must  generate  a  grid  in  the  region,  as  well  as  solve  the  problem  on  the  generated 
grid.  The  grids  were  previously  generated  using  elliptic  techniques  and  are  now  being 
generated  using  variational  techniques. 

The  global  structure  of  the  toolkit  is  straightforward.  As  mentioned  before,  the 
toolkit  consists  of  44  MACSYMA  functions.  The  functions  either  implement  a  small 
portion  of  computations  described  in  Steinberg  and  Roache12  or  combine  several  func¬ 
tions  to  do  a  more  complicated  task.  In  general,  it  is  possible  to  understand  the 
MACSYMA  code  by  following  the  mathematics12,  the  FORTRAN  code  listings14,  and 
the  symbol  code  listing13.  This  paper  discusses  the  situations  where  the  implementation 
is  least  obvious. 

Surprisingly,  MACSYMA  does  not  handle  differentiation  well.  This  point  has  been 
thoroughly  discussed  elsewhere2,15,16.  The  problem  has  to  do  with  the  way  that 
MACSYMA  uses  a  dependencies  ( depends )  notion  to  implement  the  chain  rule  for 
differentiation.  This  works  well  for  the  differentiation  of  known  function,  but  not  so  well 
for  general  functions.  The  toolkit  has  to  deal  with  many  general  functions,  so  this 
caused  a  substantial  problem. 

The  computations  done  in  this  paper  were  done  on  a  Sun  Microsystems  workstation 
(Sun2/160)  with  4  mbytes  of  main  memory  and  a  380  mbyte  (formatted)  Eagle  disk 
drive.  The  operating  system  is  Sun  UNIX  4.2  Release  2.0  which  contains  4.2  BSD 
updates.  The  Beta  Test  Release  308.2  of  Symbolics,  Inc.  MACSYMA  was  used  to  do 
the  symbol  manipulation.  (The  toolkit  could  be  implemented  in  other  symbol  manipula¬ 
tion  languages.)  The  MACSYMA  output  and  this  paper  were  prepared  using  text  proces¬ 
sors  tbl,  tqn,  and  troff  that  are  part  of  the  standard  UNIX  distribution. 
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A  SELF-ADAPTIVE  GRIDDING  FOR  INVISCID  TRANSONIC  PROJECTILE 
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Gainesv111et  Florida  32611 

Extended  Abstract.  A  good  grid  system  for  the  computation  of  complex 
fluid  dynamics  problems  can  be  justified  from  the  smoothness  of  grids,  the 
orthogonality  of  grids,  and  the  grid  resolution  adaptive  to  the  solution  In 
the  physical  space.  In  fact  the  use  of  an  Improper  grid  can  be  detrimental  to 
the  solution  accuracy  as  well  as  to  the  convergence  process  of  a  solution 
algorithm.  An  adaptive  grid  generation  method  proposed  by  Brackblll  [1]  Is 
rather  general  and  seems  to  be  a  very  promising  approach  for  complex  flow 
problems.  In  this  method  the  governing  differential  equations  for  a  2-D 
adaptive  grid  are  derived  from  extremlzlng  the  general  functional 

I  *  /  C(v£)2  +  (Vn)2]do  ♦  v,  /  [(?$  •  Vn)2  J3do  +  x  /  wJdo  (1) 

Q  0  Q  W  Q 

In  which  £  and  n  are  curvilinear  grid  coordinates  while  J  Is  the  Jacobian  of 
the  transformation  representing  the  grid  size  which  can  be  made  adaptive  to 
the  control  function  w(x,y).  The  Integrals  In  Eq.  (1)  are,  respectively, 
smoothness,  orthogonality,  and  grid  resolution  functionals.  Introducing 
characteristic  quantities  Lc,  Lp  and  W,  the  Lagrange's  multipliers  can  be 
chosen  as  K 

\> " a  *  \  “  p  w  (2) 

P  P 

so  that  each  Integral  has  the  same  order  of  magnitude  provided  o  and  p  are  of 
0(1).  Hence,  the  relative  Importance  of  the  three  Integrals  to  a  grid  can  be 
identified  from  the  value  of  a  and  p  chosen.  An  application  of  the  adaptive 
grid  generation  method  to  a  2-D  Invlscld  supersonic  flow  past  a  step  In  a  wind 
tunnel  has  been  studied  by  Saltzman  [2]  and  the  results  obtained  showed  that 
the  adaptive  mesh  generator  moves  the  computational  grid  with  shock  fronts  and 
consequently  enhances  significantly  the  desirable  resolution  of  the  finite- 
difference  scheme  for  the  accuracy. 

We  have  Investigated  an  application  of  the  adaptive  grid  generation 
method  to  transonic  Invlscld  flows  past  a  secant-oglve-cyllnder-boattall 
projectile  with  sting  at  zero  angle  of  attack.  The  control  function  for  grid 
resolution  Is  chosen  as  the  computed  pressure  gradient  while  a  conforming  one¬ 
dimensional  variational  principle  Is  employed  for  boundary  grid  adaptation. 

We  found  that  the  grid  generation  method  cannot  provide  good  grids  for  use  In 
a  thin-layer  Navler-Stokes  code  [3]  which  has  an  op.tlon  for  solving  Invlscld 
flow  problems;  consequently,  a  resulting  grid  has  to  be  Implemented  with  an 
exponential  clustering  technique  to  provide  good  meshes  near  the  projectile 
surface.  The  results  computed  then  were  acceptably  accurate  and  in  aood 
agreement  with  measured  data  reported  In  [4].  Moreover,  for  generating  a 
better  adaptive  grid,  we  have  modified  the  constrained  variational  principle. 


Complete  paper  has  been  submitted  to  International  Journal  for  Numerical 
Methods  In  Fluids  for  publication. 
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Eq.  (1)  by  considering  variable  parameters  \  and  for  enhancing  grid 
resolution  locally  but  assuming  their  variation  zero;  accordingly,  local  grid 
spaclngs  are  chosen  for  the  reference  lengths  Lc  and  LQ  In  Eq.  (2).  A  grid 
generation  code  has  been  developed  and  coupled  to  the  Navler-Stokes  code  for 
self-adaptive  grid  generation  and  the  numerical  study  conducted  showed  that 
the  adaptive  grid  generation  technique  developed  Indeed  can  provide,  without 
any  experimentation,  good  grids  for  the  transonic  projectile  aerodynamics 
computation.  For  Instance  with  a  strategy  of  generating  a  new  adaptive  grid 
fixed  at  every  150  Integration  time  steps  of  the  Navler-Stokes  code,  the 
results  obtained  for  three  flow  cases  (M  *  0.91,  0.96,  and  1.10)  showed  that 
the  computed  surface  pressure  coefficient  Is'ln  excellent  agreement  with  the 
reported  measured  data. 
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ON  COMPUTATION  OF  TRANSONIC  PROJECTILE  AERODYNAMICS 

Chen-Chl  Hsu  and  Nae-Haur  Shlau 
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University  of  Florida 
Gainesville,  Florida  32611 

ABSTRACT.  A  representative  transonic  flow,  M_  ■  0.96,  past  a  secant- 
ogl ve-cy  1 1 nder-boattal 1  projectile  model  with  sting  at  zero  angle  of  attack 
has  been  considered  for  detailed  Investigation  on  the  application  and 
Implication  of  the  thin-layer  Navler-Stokes  approximation  Implemented  with  the 
Baldwln-Lomax  turbulence  model  for  accurate  prediction  of  the  transonic 
projectile  aerodynamics.  An  axl symmetric  thin-layer  Navler-Stokes  code  and  a 
grid  generation  code  obtained  from  the  U.S.  Amy  Ballistic  Research  Laboratory 
are  employed  to  solve  the  flow  problem.  The  numerical  results  obtained  from 
the  use  of  different  hyperbolic  grids  show  that  the  Navler-Stokes  code  can 
provide  accurate  surface  pressure  If  a  good  grid  Is  provided.  The  results 
also  show  that  the  accuracy  of  surface  pressure  Is  very  sensitive  not  only  to 
the  boundary  grid  resolution  but  also  to  the  grid  distribution  In  normal 
direction  while  the  shear  stress  distribution  In  the  shock-boundary  layer 
Interaction  region  depends  strongly  upon  the  predicted  shock  location.  The 
Importance  of  a  good  adaptive  grid  Is  evidenced  from  the  computed  results 
which  show  that  the  ratio  of  pressure  drag  to  skin-friction  drag  can  be  off  by 
as  much  as  40%  from  that  of  an  accurate  result. 

I.  INTRODUCTION.  An  accurate  prediction  of  the  aerodynamic  force  and 
other  flow  characteristics  Is  essential  to  a  better  design  of  aerodynamic 
devices  and  flight  vehicles.  For  a  practical  aerodynamic  problem,  wind-tunnel 
experiments  are  traditionally  performed  to  measure  the  aerodynamic  force  and 
other  desirable  flow  characteristics.  With  the  rising  cost  of  experimental 
measurements.  It  Is  becoming  extremely  expensive  to  conduct  parametric  studies 
In  a  wind-tunnel;  moreover,  each  test  facility  has  a  limited  range  of 
application  and  consequently  certain  flow  conditions  of  Interest  often  cannot 
be  simulated.  Hence,  the  numerical  simulation  of  a  complex  aerodynamic 
problem,  with  recent  advent  of  supercomputers,  has  been  becoming  an  alternate 
approach  to  complement  the  wind-tunnel  experiment  for  effective  design. 

Recently  a  thin-layer  Navler-Stokes  code  has  been  developed  at  NASA  Ames 
Research  Center  for  unsteady  three-dimensional  high  speed  compressible  flow 
problems  Cl].  This  code  Is  based  on  the  Reynolds-averaged  thin-layer  Navler- 
Stokes  equations  for  Ideal  gas  In  a  transformed  boundary-fitted  space  and  the 
transformed  governing  equations  are  approximated  by  Beam  and  Warming 
factorized  finite  difference  scheme  In  which  a  second  order  Implicit  (e.)  and 
a  fourth  order  explicit  (eg)  artificial  dissipation  terms  have  been  added  for 
controlling  numerical  stability  of  the  solution  algorithm.  The  turbulence 
closure  model  Implemented  Is  a  two-layer  algebraic  eddy  viscosity  model  [2]. 
The  Navler-Stokes  code  also  has  been  simplified  for  axl  symmetric  projectile 
flow  problems  [3].  Both  of  these  codes  have  an  option  for  solving  Invlscld 
flow  problems  while  a  steady  solution  Is  resulted  from  a  converged  solution  of 
the  unsteady  flow  problem. 

The  application  of  the  Navler-Stokes  codes  to  transonic  projectile 
aerodynamic  problems  has  been  Investigated  to  some  extent  by  the  U.  S.  Amy 
Ballistic  Research  Laboratory  [4-8].  The  grid  provided  to  a  Navler-Stokes 
code  Is  an  axl  symmetric  grid  system  formed  by  a  sequence  of  planar  grids 
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around  the  axis  of  a  projectile  model;  the  planar  grid  Is  obtained  from  a  grid 
generation  code  GRIDGEN  which  can  give  either  an  elliptic  grid  or  a  hyperbolic 
grid  [9].  For  a  secant-oglve-cyllnder-boattall  projectile  with  sting  at  zero 
angle  of  attack,  the  published  results  showed  that  the  computed  surface 
pressure  coefficient  C_  on  the  secant-ogive  portion  and  boattall  portion  of 
the  projectile  agrees  rather  well  with  measured  data  but  the  agreement  on  the 
cylinder  portion  (shock  wave-boundary  layer  Interaction  region)  Is  not  very 
satisfactory  for  some  flow  cases  considered.  For  the  projectile  model  at  two- 
degree  angle  of  attack,  the  reported  Cp-d1str1but1on  agrees  qualitatively  with 
measured  data  but  quantitatively  the  agreement  over  the  cylinder  portion  and 
boattall  portion  Is  not  satisfactory  at  all. 

The  published  results  have  Indicated  that  the  thin-layer  Navier-Stokes 
codes  can  give  acceptably  accurate  surface  pressure  for  the  complex  transonic 
projectile  aerodynamic  problem  If  a  good  adaptive  grid  system  is  provided. 
However,  the  precise  causes  for  the  unsatisfactory  results  reported  are  yet  to 
be  investigated;  moreover,  no  result  on  the  skin-friction  coefficient  has  been 
reported  and  discussed.  Therefore,  the  main  objective  of  this  study  is  to 
further  advance  our  understanding  on  the  application  and  Implication  of  the 
thin-layer  Navier-Stokes  approximation  Implemented  with  Baldwin-Lomax 
algebraic  turbulence  model  for  accurate  prediction  of  aerodynamic  forces 
acting  on  a  transonic  projectile. 

II.  THE  FLOW  PROBLEM.  A  representative  transonic  flow  of  M  =  0.96  past 
a  secant-ogi ve-cyl 1 nder-boattal 1  (SOCBT)  projectile  model  with  stfng  at  zero 
angle  of  attack  Is  considered  for  detailed  Investigation  in  this  study.  The 
projectile  model  has  a  3-callber  secant-ogive  part  followed  by  a  2-caliber 
cylinder  and  a  1-caKber  7-degree  boattall  which  Is  further  extended  for 
another  1.77  calibers  to  meet  a  horizontal  sting.  The  projectile  model  with 
sting  has  a  total  length  of  16  calibers.  There  are  surface  pressure 
measurements  reported  for  transonic  flows  *  0.91,  0.94,  0.96,  0.98,  1,10 
and  1.20  past  the  SOCBT  projectile  [5].  The  flow  problem  considered  is  solved 
with  an  axisymmetrlc  thin-layer  Navier-Stokes  code  and  a  grid  generation  code 
GRI0GEN  obtained  from  the  Army  Ballistic  Research  Laboratory.  It  is  mentioned 
In  passing  that  an  averaged- technique  Is  used  In  the  Navier-Stokes  code  for 
computing  eddy  viscosity,  which  results  in  Improper  zig-zag  eddy  viscosity 
distributions.  Hence,  the  averaged-scheme  for  eddy  viscosity  has  been  deleted 
from  the  code  in  this  study. 

III.  RESULTS  AND  DISCUSSION.  An  application  of  the  Navi' r-Ttokes  code 
for  projectile  aerodynamic  problems  requires  that  the  user  pro.  ''t;s  a  planar 
grid.  In  this  study  the  planar  grid,  provided  to  the  Navier-Stokes  code  Is  a 
modified  hyperbolic  grid  [10].  As  Indicated  In  reference  [9],  the  grid 
resolution  of  a  hyperbolic  grid  Is  somewhat  predetermined  by  the  prescribed 
boundary  grid  distribution  and  the  choice  of  a  clustering  function.  In  the 
grid  generation  code  GRIDGEN  an  exponential  clustering  function  Is  employed  to 
ensure  sufficiently  fine  grid  resolution  for  the  viscous  sublayer;  however, 
other  clustering  functions  such  as  a  hyperbolic  tangent  can  also  be  used 
[11].  Figure  1(a)  shows  a  78  x  28  hyperbolic  grid  obtained  from  GRIDGEN  while 
Figure  1(b)  Is  a  78  x  28  hyperbolic  grid  generated  with  the  use  of  a 
hyperbolic  tangent  clustering  function.  It  Is  observed  that  the 
characteristics  of  the  two  clustering  functions  are  exhibited  In  the 
distribution  of  normal  grid  points.  A  number  of  different  hyperbolic  grids 
has  been  considered  for  the  flow  problem  to  Investigate  the  Implication  of  the 
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Figure  1(a).  A  78  x  28  hyperbolic  grid  based  on  exponential  clustering  function. 


Figure  1(b).  A  78  x  28  hyperbolic  grid  based  on  hyperbolic  tangent  clustering 
function. 
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Navler-Stokes  approximation  and  its  sensitivity  to  grid  for  accurate 
solutions. 

A  very  accurate  solution  has  been  obtained  for  the  flow  problem  with  a 
90  x  60  grid  for  a  flow  domain  of  about  4  projectile-length,  l.e.  STOT  =  24. 
Figure  2(a)  shows  that  the  surface  pressure  coefficient,  (L  computed  is  In 
excellent  agreement  with  measured  data  while  the  shear  stress  computed  and 
presented  in  Figure  2(b)  indicates  that  there  Is  no  flow  separation  on  the 
transonic  projectile  surface.  A  non-dimensional  drag  force,  due  to  surface 
pressure,  computed  is  Dp  *  40.28  for  the  6-callber  projectile  while  the  drag 
force  resulted  from  theFcomputed  shear  stress  is  Df  =  20.53.  It  shows  the 
importance  of  viscous  flow  computation  for  accurate  aerodynamic  force 
prediction. 

A  finite- difference  approximation  for  Navier-Stokes  equations  often 
requires  artificial  dissipations  to  control  numerical  instability  problem. 

In  the  axlsymmetric  thin-layer  Navler-Stokes  code,  a  second  order 
Implicit  (e.)  and  a  fourth  order  explicit  (e,)  dissipation  terms  have  been 
added.  Henie,  it  Is  Important  to  find  out  trie  effect  of  e.  and  upon  the 
accuracy  of  numerical  solutions.  Since  the  accurate  solution  computed  is 
based  on  e.  =  2eF  *  4  At,  the  flow  problem  has  been  solved  again  with  the  same 
grid  but  ei  =  2ec  *  8  At.  The  computed  Cp,  as  shown  In  Figure  2(a),  seems  to 
agree  veryAwell  with  the  accurate  solution?  however,  the  resulting  pressure 
drag  force  is  Dp  =  32.91  which  is  18%  less  than  that  of  the  accurate 
solution.  Figure  2(b)  shows  the  difference  on  shear  stress  distribution  but 
the  resulting  shear  drag  force  is  Df  =  20.12. 

The  effect  of  the  averaged- technique  originally  implemented  in  the 
Navier-Stokes  code  for  computing  eddy  viscosity  also  has  been  investigated. 
Figure  3  shows  the  distribution  of  eddy  viscosity  with  and  without  the 
averaging  scheme  at  three  different  boundary  point  stations  identified  In 
Figure  2.  It  is  clear  that  the  averaged- technique  yields  Imporper  zig-zag 
distribution;  however.  It  has  negligible  effect  on  the  surface  pressure.  In 
fact  the  computed  Cp  distribution  is  almost  exactly  the  same  as  that  of  the 
accurate  solution  snown  in  Figure  2(a)  but  the  shear  stress  Is  consistently 
less  than  that  of  the  accurate  solution  over  the  entire  projectile  surface. 

The  corresponding  drag  forces  are  Dp  3  41.41  and  Df  *  18.99. 

For  the  sensitivity  of  solution  accuracy  to  a  given  grid,  the  flow 
problem  with  the  domain  of  STOT  3  24  has  been  solved  again  with  two  additional 
grids.  The  first  grid  Is  a  90  x  40  hyperbolic  grid  based  on  exponential 
clustering  function  while  the  second  one  Is  a  90  x  40  hyperbolic  grid  based  on 
hyperbolic  tangent  clustering  function.  For  these  two  grids,  the 
characteristics  of  normal  grid  distribution,  similar  to  those  shown  in  Figure 
1,  are  quite  different.  The  computed  surface  pressure  presented  in  Figure 
4(a)  shows  that  the  grid  resolution  with  40  points  in  the  normal  direction  Is 
not  sufficient;  moreover,  the  smoother  grid  resulted  from  the  use  of 
hyperbolic  tangent  clustering  function  yields  a  better  solution  with  Dp  3 
27.38  than  the  other  grid,  Dp  =  23.95.  The  corresponding  shear  stress^given 
In  Figure  4(b)  clearly  Indicates  that  the  shear  stress  is  very  sensitive  to 
the  computed  pressure  field  In  the  shock  wave-boundary  layer  interaction 
regions;  however,  the  resulting  shear  drag  Is  about  the  same,  20.05  and 
20.28.  It  should  be  pointed  out  that  a  smoother  grid  does  not  always  provide 
a  more  accurate  solution.  In  fact  the  flow  problem  with  a  domain  of  STOT  3  18 
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Figure  A (a).  Surface  pressure  coefficient  computed  with  different  grids. 

- ;  90  x  60  grid  from  GRIDGEN;  -  :  90  x  40  grid  from  GRIDGEN; 

- :  90  x  40  grid  based  on  hyperbolic  tangent  clustering  function. 


Figure  4(b).  The  corresponding  shear  stress  distribution  computed  with 
different  grids. 
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has  also  been  solved  with  78  x  28  grids  shown  in  Figure  1,  and  the  results 
obtained  shows  that  the  smoother  grid  gives  better  result  on  the  boattail 
portion  out  the  other  grid  yields  much  more  accurate  solution  on  the  cylinder 
portion  of  the  projectile. 

IV.  CONCLUDING  REMARKS.  It  is  concluded  that  the  thin-layer  Navler- 
S tokos  approximation  wi th  Bal dwi n -Lomax  turbulence  model  can  yield  accurate 
solutions  for  transonic  projectile  aerodynamic  problems  without  flow 
separation,  provided  a  good  grid  system  is  employed  in  the  solution 
algorithm.  A  good  grid.  In  general,  is  characterized  by  the  smoothness  of 
grids,  the  orthogonality  of  grids  and  the  grid  resolution  adaptive  to  the 
solution  fields.  For  a  projectile  aerodynamic  problem  with  separation, 
however,  the  application  of  the  Navier-Stokes  equations  Implemented  with 
Baidu  in-Lomax  turbulence  model  for  accurate  prediction  of  aerodynamic  forces 
is  yet  to  be  investigated  and  verified. 
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ABSTRACT.  Implicit,  approximately  factored,  finite  difference  codes  have 
been  devel oped  for  solving  the  Navi er-Stokes  equations  in  general  body-fitted 
coordinates.  For  a  protuberance  such  as  the  rotating  band  on  artillery  shell, 
sharp  geometric  variations  exist  which  make  it  extremely  difficult  to  generate 
body-conforming  grids  while  preserving  the  sharp  corners.  Using  wrap  around 
grids  for  such  cases  introduces  geometric  errors  and  may  lead  to  degradation 
of  computational  efficiency  and  accuracy.  This  paper  describes  the  develop¬ 
ment  and  application  of  a  computational  procedure  using  flowfield  blanking  to 
compute  the  flow  ovc.  a  rotating  band  at  supersonic  speed  with  no  geometric 
error. 

I,  INTRODUCTION.  In  recent  years,  a  considerable  research  effort  has 
been  focused  on  the  development  of  modern  predictive  capabilities  for  deter¬ 
mining  the  aerodynamics  of  projectiles*  Time-dependent  Navier-Stokes  computa¬ 
tional  technique  has  been  used1*2  to  compute  the  flow  over  projectiles  at 
transonic  speeds.  For  supersonic  flows,  space-marching  parabolized3  Navier- 
Stokes  computational  technique  can  be  effectively  used.  However,  this 
technique  fails  for  flows  containing  separation  regions  in  the  streamwise 
direction.  In  such  cases  which  are  encountered  frequently  in  projectile 
aerodynamic  simulations,  time-dependent  Navier-Stokes  technique  can  still  be 
used.4  * 5 

The  time  dependent  Navier-Stokes  equations  are  solved  in  generalized 
body-fitted  coordinate  system.  Many  actual  projectiles  configurations  contain 
sharp  corners  and  90°  bends;  in  other  words,  sharp  geometric  variations  exist 
on  shell  which  make  it  extremely  difficult  to  generate  body-conforming  grids 
while  preserving  the  sharp  corners.  The  grid  lines  are  wrapped  around  the 
corners  and  in  many  cases,  such  wrap  around  grids  are  skewed  near  these 
corners  and  bends.  Using  such  grids  Introduces  geometric  errors  and  may  lead 
to  loss  in  both  the  computational  efficiency  and  accuracy.  The  purpose  of 
this  paper  Is  to  develop  and  apply  a  flow  field  blanking  procedure  which 
allows  computation  of  practical  flows  of  Interest  with  no  geometric  error 
since  it  models  the  corners  and  bends  exactly. 

To  avoid  geometric  errors  one  can  blank  out  the  flow  field  in  specific 
regions  in  the  computational  domain.  Examples  where  such  blanking  can  be 
useful  are  shown  In  Figure  1.  Continuous  straight  line  grids  can  be  used  for 
these  cases  and  the  hatched  regions  are  the  ones  where  the  flow  field  is  to  be 
blanked  out.  This  procedure,  thus,  preserves  the  sharp  corners  and  bends.  In 
addition  to  zeroing  out  the  flow  field  Inside  the  hatched  regions,  additional 
changes  must  be  made  in  terms  of  boundary  conditions  on  these  zonal  surfaces 
and  the  computational  algorithm  near  these  surfaces  which  are  described  in  a 
later  section.  The  simplest  example  to  test  this  technique  is  the  rotating 
band  flow  problem.  The  rotating  band  Is  a  protuberance  on  the  artillery  shell 
and  Is  primarily  used  to  impart  spin  to  the  shell  during  launch.  In  free 
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flight,  however,  it  does  contribute  a  small  unwanted  drag.  A  schematic  of  the 
rotating  band  flow  field  is  shown  In  Figure  2.  It  shows  the  expected  recircu¬ 
lation  regions  in  front  of  and  behind  the  band  and  the  associated  compressions 
and  expansion  waves.  Numerical  solution  is  obtained  for  this  problem  at  = 
3.0  and  a  =  0. 

II.  COMPUTATIONAL  TECHNIQUE. 

A.  GOVERNING  EQUATIONS.  The  complete  set  of  time-dependent 
generalized  axi symmetric  thin-layer  Navier-Stokes  equations  is  solved  numeri¬ 
cally  to  obtain  a  solution  to  this  problem.  The  numerical  technique  used  is 
an  implicit  finite  difference  scheme.  Although  time-dependent  calculations 
are  made,  the  transient  flow  is  not  of  primary  interest  at  the  present  time. 
The  steady  flow  is  the  desired  result,  which  is  obtained  in  a  time  asymptotic 
fashion. 


The  azimuthal  invariant  (or  generalized  axi  symmetric)  thin-layer 
Navier-Stokes  equations  for  curvilinear  coordinates  £,  n  and  c  can  be  written 

as  1 


A  A  A 


where 


£  =  C(x,y,z,t)  is  the  longitudinal  coordinate 
n  =  n(y,z,t)  is  the  circumferential  coordinate 
C  =  c(x,y,z,t)  is  the  near  normal  coordinate 
t  =  t  is  the  time 


p 
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pW 
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puU+£xP 

puWHxP 
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s  = 


0 

u(cx2  ♦  cy2  +  ?Z2)U;  ♦  (u/3)(;xuc  *  cyv  +  c2wt)cx 
*(tx2  *  cy2  +  tx2)*;  +  (u/3)(?xut  +  5yvc  +  ?2wt)5y 
u(ix2  Hy2  ♦  ;22)wt  *  (»/3)(?xiit  Hyvc  ♦  t2»,)t2 

<(cx2  +  iy2  +  iy2)[(u/2)(u2  +  V2  +  w2){  +  «Pi">  (Y-l)‘‘(a2)?] 

+  (m/3) (cxu  +  CyV  +  czw)(cxyc  +  cyvc  +  czw?)} 


The  velocities 


U  «  Kt  +  5xu  +  Syv  +  ^2w 

V  =  nt  +  nxu  +  nyv  +  nzw  (2) 

W  =  +  ?xu  +  cyv  +  czw 

represent  the  contravariant  velocity  components. 

The  Cartesian  velocity  components  (u,  v,  w)  are  nondiinensional ized 
with  respect  to  (free  stream  speed  of  sound).  The  density  (p)  is 

referenced  to  pB  and  total  energy  (e)  to  p***2.  The  local  pressure  is 

determined  using  the  equation  of  state. 


P  =  (y  -  l)[e  -  0.5p(U2  +  V2  +  W2)] 


(3) 


where  y  is  the  ratio  of  specific  heats. 

While  Equation  (1)  contains  only  two  spatial  derivatives,  it  retains 
all  three  momentum  equations,  thus  allowing  a  degree  of  generality  over  the 
standard  axi symmetric  equations.  In  particular,  the’  circumferential  velocity 
is  not  assumed  to  be  zero,  thus  allowing  computations  for  spinning  projectiles 
or  swirl  flow  to  be  accompl ished. 

B.  COMPUTATIONAL  ALGORITHM.  The  azimuthal  thin-layer  Navi er-Stokes 
equations  are  solved  using  an  implicit  approximate  factorization  finite  dif¬ 
ference  scheme  in  delta  form.6  An  implicit  method  was  chosen  because,  for 
viscous  flow  problems,  It  permits  a  time  step  much  greater  than  that  allowed 
by  explicit  schemes.  The  Beam-Warming  Implicit  algorithm  has  been  used  In 
various  applications1'9  for  the  equations  In  general  curvilinear  coordinates. 
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The  algorithm  is  first-order  accurate  in  time  and  second-  or  fourth-order 
accurate  in  space.  The  equations  are  factored  (spatially  split),  which 
reduces  the  solution  process  to  one-dimensional  problems  at  a  given  time 
level.  Central  difference  operators  are  employed  and  the  algorithm  produces 
block  tridiagonal  systems  for  each  space  coordinate.  The  main  computational 
work  is  contained  in  the  solution  of  these  block  tri diagonal  systems  of 
equations. 


C.  FLOW  FIELD  BLANKING.  The  idea  is  to  avoid  geometric  errors  that 
may  arise  from  wrap  around  grids.  Instead,  we  use  straight  line  grids  as 
shown  schenatical ly  in  Figure  3.  For  the  rotating  band  problem,  the  zone  ABCO 
is  part  of  the  body  and  the  flow  field  in  this  zone  must  be  blanked  out  in  the 
computational  domain.  As  shown  in  Figure  3,  the  sharp  corners  and  90°  bends 
ahead  of  and  behind  the  band  are  preserved  and  no  approximation  is  made.  It 
is  also  necessary  to  apply  boundary  conditions  on  the  zonal  surfaces  AB,  BC 
and  CD.  As  an  initial  attempt,  inviscid  boundary  conditions  are  used  at  these 
boundaries  since  the  grid  is  rather  coarse  at  these  boundaries.  In  addition, 
at  neighboring  points  to  these  boundaries,  we  use  second-order  spatial  dif¬ 
ference  and  smoothing.  The  block  tridiagonal  matrix  structure  has  been 
modified  for  continuous  integration  sweeps  through  such  zones.  Although,  we 
have  only  one  zone  for  the  rotating  band  case,  changes  have  been  made  in  the 
code  to  blank  out  multiple  zones. 

111.  RESULTS.  All  the  numerical  computations  were  made  at  =  3.0  and 

a  =  0.  The  projectile  configuration  with  the  rotating  band  that  was  used  in 
this  study  is  shown  in  Figure  4.  This  model  is  a  cone-cylinder  configuration 
with  a  11.1°  cone  angle.  The  band  height  is  .04  D  and  the  width  is  .505  D. 
The  same  model  was  used  in  the  experiments10  which  were  conducted  in  the  1JS 
Army  Chemical  Research  Development  and  Engineering  Center's  Supersonic  Wind 
Tunnel.  Surface  pressure  measurements  have  been  made  ahead  of  and  behind  the 
band  which  are  used  to  compare  with  the  numerical  results. 

Since  the  freestream  flow  is  supersonic,  the  space  marching  Parabolized 
Navi er-Stokes  code3  was  used  to  compute  the  solution  over  the  forebody  of  the 
projectile  (See  Figure  5).  This  generated  a  solution  at  a  station  30  band 
heights  ahead  of  the  band  which  was  then  used  as  an  upstream  boundary  con¬ 
dition  for  the  computation  of  the  flow  field  containing  the  rotating  band. 
For  this  part  of  the  flow  field  which  includes  the  band,  the  unsteady  or  time- 
dependent  Navi er-Stokes  computational  technique  described  earlier  was  used. 
Such  composite  solution  technique  allowed  a  large  number  of  grid  points  to  be 
used  in  the  vicinity  of  the  band. 

The  computational  grid  used  for  the  numerical  calculations  is  shown  in 
Figure  6.  It  consists  of  139  points  in  the  longitudinal  direction  and  60 
points  in  the  normal  direction.  The  grid  points  are  clustered  near  the 
surface  of  the  cylindrical  part  with  a  minimum  spacing  of  .00002  D.  The 
resolution  of  grid  points  on  the  top  of  the  band  is  not  as  fine.  Grid  points 
in  the  longitudinal  direction  are  clustered  near  the  upstream  and  downstream 
corners  of  the  rotating  band  where  appreciable  changes  In  the  flow  variables 
are  expected.  In  Figure  6,  the  grid  lines  Inside  the  band  are  omitted  to  show 
the  position  of  the  band;  however,  in  actual  grid  used  in  the  computations, 
there  are  continuous  grid  lines  inside  the  band  and  those  are  the  lines  where 
the  flow  field  blanking  procedure  is  used. 
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For  comparison  purposes,  numerical  solution  is  first  obtained  for  ilow 
over  the  cylindrical  part  of  the  projectile  without  the  rotating  band  at  = 

3.0  and  a  =  0.  Computed  surface  pressure  coefficient  is  plotted  in  Figure  7 
as  a  function  of  the  longitudinal  position.  The  computed  result  is  in  very 
good  agreement  with  experimental  data.10 

Numerical  results  obtained  for  the  rotating  band  case  are  presented 
next.  Figure  3a  shows  the  velocity  vector  field  in  front  of  the  band  and  as 
expected,  it  shows  reel rculatory  flow  In  that  region.  The  reverse  flow  region 
extends  about  four  band  heights  ahead  of  the  band.  Figure  8b  shows  the 
velocity  vectors  behind  the  band.  The  flow  seems  to  expand  over  a  large 
portion  of  the  band  height.  A  smaller  recirculation  region  can  be  observed. 
The  flow  expansions  can  be  better  seen  in  Figure  9  which  shows  the  pressure 
contours  for  this  case.  One  can  also  see  a  separation  shock  wave  ahead  of  the 
band.  The  shock  wave  is  located  just  ahead  of  the  flow  separation  region. 
The  surface  pressure  coefficient  for  the  band  case  is  shown  in  Figure  10  as  a 
function  of  the  axial  position.  The  solid  line  Is  the  computed  result,  the 
dashed  line  is  the  result  obtained  for  the  case  without  the  band  and  the 
circles  are  the  experimental  data  for  the  band.  There  is  a  considerable 
change  in  the  pressure  due  to  the  band.  The  sharp  rise  in  pressure  ahead  of 
the  band  is  associated  with  the  compression  waves  which  actually  precedes  the 
separation  point  of  the  boundary  layer  flow.  The  flow  then  expands  near  the 
corner  and  pressure  drops.  No  significant  change  in  pressure  occurs  on  the 
top  of  the  band.  At  the  backward  step  of  the  band,  the  flow  expands  again 
which  results  in  the  sharp  decrease  In  the  pressure.  This  Is  followed  by  a 
more  gradual  return  to  the  ambient  pressure  downstream.  The  computed  surface 
pressure  is  in  good  agreement  with  the  experimental  data  measured  ahead  and 
behind  the  band. 

IV.  CONCLUDING  REMARKS.  Navi er-Stokes  computational  has  been  used  in 
conjuction  with  a  flow  field  blanking  procedure  for  numerical  simulation  which 
models  the  sharp  corners  and  90°  bends  exactly,  thereby,  avoiding  any  possible 
source  of  geometric  errors.  This  procedure  has  been  applied  to  the  flow  over 
a  rotating  band  at  supersonic  speed. 

Computed  results  have  been  obtained  at  M,,  *  3.0  and  a  *  0  and  compared 

with  available  experimental  data.  The  results  show  the  recirculation  region 
both  ahead  of  and  behind  the  rotating  band  as  well  as  the  associated 
compression  and  expansion  waves.  Computed  surface  pressure  coefficient  for 
both  cases,  with  and  without  the  band.  Is  In  fairly  good  agreement  with  the 
experimental  data.  The  present  numerical  procedure  is  simple  to  use  and  seems 
to  predict  the  flow  field  correctly. 
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Figure  1.  Examples  of  Flowfield  Blanking 
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Figure  2.  Schematics  of  Rotating  Band  Flowfleld 
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Figure  3.  Schematic  Illustration  of  Flowfield  Blanking 
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Figure  4.  Model  Geometry 
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Figure  6.  Computational  Grid  Expanded  Near  the  Model 
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Figure  9.  Pressure  Contours,  *  3.0,  a  «  0 
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ABSTRACT.  The  time-dependent  Navi er-Stokes  computational  technique  has 
been  in  general  use  at  the  Ballistic  Research  Laboratory  to  predict  transonic 
flows  over  projectiles.  Recently,  efforts  have  been  made  to  improve  the  com¬ 
putational  efficiency  of  this  code  by  analyzing  and  including  a  spatially 
varying  time  step  and  various  improved  artificial  dissipation  models.  The 
combined  effect  of  these  changes  has  led  to  a  significant  gain  in  the  robust¬ 
ness  and  convergence  characteri sties  for  steady  state  appl ications.  These 
techniques  have  been  used  to  compute  the  practical  problem  of  flow  over  a 
projectile  at  transonic  speeds.  The  results  confirm  the  improvements  achieved 
for  such  calculations. 

I.  INTRODUCTION.  In  the  last  several  years,  computational  aerodynamic 
capabilities  have  been  developed  and  used  to  compute  projectile  aerodynamics 
at  transonic  speeds.  These  numerical  capabilities1*^  use  the  thin-layer 
Navier-Stokes  computational  technique  and  have  been  applied  to  various 
spinning  and  nonspinning  projectiles  at  zero  angle  of  attack.  The  time- 
dependent  set  of  thin-layer  Navier-Stokes  equations  are  used  and  the  solutions 
are  marched  in  time  until  a  steady  state  result  is  achieved.  Since  the  pri¬ 
mary  interest  is  in  the  final  steady  state  result,  it  is  desirable  to  achieve 
the  converged  solution  as  quickly  as  possible  which  depends  on  the  computa¬ 
tional  algorithm  and  also  on  the  computational  architecture  used. 

The  time-dependent  Navier-Stokes  codes  can  be  vectorized  to  run  on  a 
vector  processor  on  the  Cray-XMP.  A  vectorized  version  of  the  code  can  run 
approximately  2-3  times  faster  than  the  original  unvectorized  code.  Gain  in 
computational  efficiency  can  also  be  achieved  due  to  improvements  made  in  the 
computational  algorithm.  Use  of  a  spatially  varying  time  step,  improved 
numerical  dissipation  models  and  implicit  treatment  of  the  boundary  condition 
procedure  are  some  of  the  techniques  that  can  and  have  been  used1*4  to  improve 
the  overall  efficiency  of  the  computational  technique.  The  combined  effect  of 
these  changes  have  provided  significant  gain  in  the  robustness  and  convergence 
characteristics  for  steady  state  applications  which  are  of  primary  interest  to 
us.  The  purpose  of  this  paper  is  to  Incorporate  a  spatially  varying  time 
stepping  procedure  and  improved  artificial  dissipation  models  to  a  BRL  time- 
dependent  Navier-Stokes  code1  for  steady  state  applications  in  transonic 
projectile  aerodynamics. 

The  resulting  solver  has  been  used  to  compute  transonic  flow  over  a 
secant-oglve-cylinder-boattail  projectile  at  M^  =  .90  and  a  =  0.  Computed 

results  confirm  the  Improvements  achieved  for  such  calculations.  A  brief 
description  of  the  governing  equations  and  the  computational  technique  is 
first  given.  The  algorithm  Improvements  used  for  transonic  viscous  flow 
simulation  are  then  described. 
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II.  GOVERNING  EQUATIONS  AND  COMPUTATIONAL  TECHNIQUE.  The  Azimuthal 
Invariant  (or  Generalized  Ax:symmetnc)  thin-layer  Navier-Stokes  equations  for 
general  spatial  coordinates  K,  n,  c  can  be  written  as:1 

+  a_E  f  a^G  +  H  =  Re-13?S  (1) 


where  5  =  r.(x,y,z,t)  is  the  longitudinal  coordinate 

n  =  n(y»z,t)  is  the  circumferential  coordinate 
c  =  c(x,y,z,t)  Is  the  near  normal  coordinate 
t  =  t  is  the  time 


and 
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The  velocities 


u  =  h  +  ^xu  "  V  f  f*zw 

V  =  nt  +  nyu  *■  nyv  +  n2w  (2) 

W  =  r.  +  r  u  f  5  v  +  c  w 

l»  a  Jr  Z 

represent  the  contravariant  velocity  components. 

The  Cartesian  velocity  components  (u,  v,  w)  are  nondimensional i zed  with 
respect  to  a^  (free  stream  speed  of  sound).  The  density  (p)  is  referenced  to 

and  total  energy  (e)  to  p^2.  The  local  pressure  is  determined  using  the 
equation  of  state, 


P  =  (y  -  1 ) [e  -  O.Sp(u2  +  V2  +  w2)] 


(3) 


where  y  Is  the  ratio  of  specific  heats. 

In  equation  (1),  axi symmetric  flow  assumptions  have  been  made  which  re- 

A 

suit  in  the  source  term,  H.  The  details  of  how  this  is  obtained  can  be  found 
in  Reference  1  and  are  not  discussed  here.  Equation  (1)  contains  only  two 
spatial  derivatives.  However,  it  retains  all  three  momentum  equations  and 
allows  a  degree  of  generality  over  the  standard  axi symmetric  equations.  In 
particular,  the  ci rcumferential  velocity  is  not  assumed  to  be  zero  thus 
allowing  computations  for  spinning  projectiles  to  be  accomplished. 

The  numerical  algorithm  used  is  the  Beam-Warming  fully  implicit,  approxi¬ 
mately  factored  finite  difference  scheme.  The  algorithm  can  be  first  or 
second  order  accurate  in  time  and  second  or  fourth  order  accurate  in  space. 
Since  the  Interest  is  only  in  the  steady-state  solution.  Equation  (1)  is 
solved  In  an  asymptotic  fashion  and  first  order  accurate  time  differencing  is 
used.  The  spatial  accuracy  is  fourth  order.  Details  of  the  algorithm  are 
Included  In  References  5-7. 

To  suppress  high  frequency  components  that  appear  in  regions  containing 
severe  pressure  gradients,  e.g. ,  shocks  or  stagnation  points,  artificial 
dissipation  terms  are  added.  Different  dissipation  models  have  been  used  and 
are  described  in  the  next  section.  The  best  results  were  obtained  with  a 
switching  dissipation  model  which  is  a  blend  of  second  and  fourth  order 
dissipation  terms.  This  switching  model  Is  similiar  to  the  model  used  by 
Pulliam3  and  uses  a  fourth  order  dissipation  in  smooth  regions  and  switches  to 
a  second  order  dissipation  In  regions  containing  high  pressure  or  density 
gradients.  Incorporation  of  this  dissipation  model  has  improved  the  quality 
of  the  results  and  has  made  the  code  more  robust. 
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III.  DISSIPATION  MODELS 


A.  ORIGINAL  DISSIPATION  MODEL.  The  implicit  approximately  factored 
algorithm  developed  by  Beam-Warming7  has  the  form 

[I  +  h6rAn  +  D^]  [I  +  h6  Cn  -  hRe'1  6  J-1  MnJ  +  Q^]Aqn 

5  t>  c  c  c 


■h[6eEf 


V 


-  Re 


-l 


6cs 


+  H 


(0, 
n  ] 


(4) 


where  the  explicit  fourth-order  dissipation  is 


M 

o 


<V<)2]0 


and  the  implicit  second-order  dissipation  terms  are 

(2) 

D.  =  -ce..W  >(Vf_45)J 

(2) 

-  -EgfitJ  ‘(VC)J  • 

The  fourth-order  explicit  dissipation  is  used  to  control  non-linear  instabil¬ 
ities  whereas  the  implicit  dissipation  is  included  to  stabilize  the  explicitly 
treated  fourth-difference  terms.  The  parameter  eg  is  0(1)  and  the  parameter 

cj  -  (2-3)  ee.  If  eg  is  large  enough,  in  most  cases  stability  of  the  scheme 

can  be  maintained.  However,  increased  explicit  smoothing  makes  the  solution 
less  accurate  and  in  many  cases  cannot  eliminate  the  oscillations  observed 
near  the  shock  waves.  An  example  of  the  type  of  oscillations  which  can  be 
found  with  the  central  finite  difference  solution  is  shown  in  Figure  1.  This 
figure  shows  a  converged  solution3  for  an  airfoil  at  =  O.fi  and  a  =  0.  The 

numerical  oscillations  in  the  vicinity  of  the  shocks  (X/C  =  .4  and  .6)  are 
apparent. 


B.  SWITCHING  DISSIPATION  MODEL.  One  way  to  eliminate  the 
oscillations  near  shocks  7s  to  use  a  second  order  numerical  dissipation 
locally  near  the  shocks  and  fourth  order  dissipation  elsewhere.  This  Idea  has 
been  used  by  Jameson  et  al8  and  forms  the  basis  of  this  switching  dissipation 
model.  As  an  example,  a  second-order  dissipation  was  used3  in  the  region  of 
shock  wave  for  the  flow  problem  of  Figure  1  and  the  solution  is  shown  In 
Figure  2.  The  oscillations  at  the  shock  are  eliminated  and  a  smooth  solution 
is  obtained. 

As  discussed  in  Reference  3,  one  can  look  at  the  upwind  schemes  as  a 
guidance  to  how  much  dissipation  is  required.  The  upwind  schemes  have 
inherent  dissipation  and  there  is  no  need  to  add  to  numerical  dissipation 
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explicitly.  In  Reference  3,  it  has  been  shown  that  the  upwind  flux  split 
scheme  of  Steger  and  Warming9  is  equivalent  to  using  a  central  finite 
difference  plus  some  form  of  dissipation.  For  example,  if  the  upwind  scheme 
is  second-order,  it  can  be  written  as  a  central  difference  plus  an  added 
fourth-order  dissipation 

3E  E.  -  E.  .  1 

- LA - LA  + —  (a  7  )2  |A|  q  (5) 

3£  2a£  4aj;  4  ^ 


where  a  and  v  are  one-sided  forward  and  backward  finite-difference  operators 

S* 

(a  u.  =  u.  .  -  u.  and  f  u.  =  u.  -  u.  . ).  If  first-order  differences  are 

?  J  J  ”  A  J  S  J  J  J  "  A 

applied  in  the  upwind  scheme,  then  we  get  a  central  difference  plus  a  second- 
order  dissipative  term, 


3E 

3C 


2A£ 


1 

-  (Afv  )  | A |  q 

2A£  ’  ^ 


(6) 


Generally,  the  best  approach  for  an  upwind  scheme  is  to  use  a  first- 
order  difference  at  shocks  and  second-order  elsewhere.  As  shown  above,  this 
is  equivalent  to  using  a  second-order  dissipation  near  the  shock  and  fourth- 
order  dissipation  elsewhere  for  a  central  finite  difference  algorithm  such  as 
the  one  used  in  our  unsteady  codes. 

To  mimmick  the  flux  split  upwind  difference  scheme,  a  second-order 
dissipation  term  is  added  to  right  hand  side  of  Equation  (4)  which  is  given 
as: 


■  4aT  ^  1  p(A)(Aj.V£)J  q  . 

With  the  fourth-order  dissipation  term  included,  the  full  dissipation  can  he 
written  as: 


~  I  1 1  [A  Ejj  1 7^7 1 5  J  9  "  5  ce  ^  ^3  (?) 

J 

where  the  first  term  is  the  second-order  dissipation  and  the  second  term 
contains  the  fourth-order  dissipation.  The  coefficients  ed  and  eg  are  the 

associated  coefficients  for  the  second-order  and  fourth-order  dissipation, 
respectively.  Note  that  the  fourth-order  dissipation  is  non-linear,  in  that, 
the  coefficient  is  not  a  constant  and  is  scaled  by  spectral  radius  ||AJ|. 
The  two  terms  in  Equation  (7)  are  of  the  form  6a60  where: 

<s  “ 6  8>j  ■  <«V  1  -  V  *  ^  <6j  -  "j-i> 
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and 


“  =  ed  1^1  or  ee 


0  =  J  q  or  JAVq 


For  automatic  switching  from  fourth-order  dissipation  to  second- 
order  dissipation  near  shocks  etc.,  we  introduce  a  scaling. 


■j- 


1  if  ee  >  ed  |^y|  (Fourth-Order) 

0  Ge  <  ed  l^r!  (Second-Order)  . 


(3) 


With  this  switching  built-in,  the  numerical  dissipation  term  in  the  streamwise 
direction,  for  example,  can  be  written  as: 

7  HAJIC.5(gj+1  *  9l)(5fjtl  -  5j)(i  --lilljlii)  -  .5(9j  *  gj.j) 


^  Vi  '  J2  5j)  (9) 


'j  *  'j-i 


.  c.M— qj  -  «»  qj., 


)] 


where  q  =  J  q,  gj  =  ed  ’  52  =  7A 


IIAJI  =  max  [(|U  +  UJ)(1  +  M J,  6.0] 


The  dissipation  term  in  the  normal  direction  is  similarly  added.  Here,  the 
pressure  gradient  is  used  in  the  second-order  dissipation  term  as  opposed  to 
the  density  gradient  used  in  the  longitudinal  direction. 

IV.  SPACE  VARYING  At.  For  projectile  aerodynamics,  the  interest  is 
generally  In  obtaining  a  final  steady  state  result;  therefore,  we  can  use  time 
step  sequences  or  spatially  variable  time  steps  to  accelerate  convergence. 
For  a  fixed  At,  the  Courar<  number  is  not  uniform  since  the  grid  spacings  vary 
from  very  fine  to  very  coarse  in  the  flow  field  region  of  interest.  The  use 
of  a  space  varying  At  can  thus,  be  interpreted  as  an  attempt  to  use  a  more 
uniform  Courant  number  throughout  the  field. 
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For  an  aerodynamic  simulation  where  the  grid  is  highly  stretched,  we  can 
use  a  purely  geometric  variation  of  At  given  asJ: 


At 


(At)ref 

1  +  /T 


(10) 


where  J  is  the  Jacobian  of  transformation.  The  time  step,  h  in  Equation  (4) 
is  then  replaced  by  At  given  in  Equation  (10). 

V.  RESULT.  The  model  used  in  the  computations  consists  of  a  three 
caliber  secant-ogive  nose,  a  two  caliber  cylinder  and  a  one  caliber  7°  boat- 
tail  (See  Figure  3).  Surface  pressure  measurements  have  been  made  by  Kayser 
et  al10  for  this  projectile  and  are  compared  with  the  present  and  past  com¬ 
puted  results.  For  computational  efficiency,  the  base  flow  is  not  included 
and  the  boattail  is  extended  as  a  sting. 

The  computational  grid  used  for  the  numerical  computations  was  obtained 
using  a  modified  version  of  a  hyperbolic  grid  generator.11  The  full  grid  is 
shown  in  Figure  4  and  consists  of  123  longitudinal  points  and  56  radial 
points.  The  computational  domain  extends  to  about  3.5  body  lengths  in  front, 
in  radial  direction  and  behind  the  projectile.  An  expanded  view  of  the  grid 
near  the  projectile  is  shown  in  Figure  6.  The  grid  points  are  clustered  near 
the  ogive-cylinder  and  cyl inder-boattai 1  junctions  in  the  longitudinal 
direction.  In  the  normal  direction,  the  grid  points  are  clustered  near  the 
body  surface  with  a  minimum  spacing  of  .00002  I)  and  are  stretched  out  to  the 
far  field. 

All  the  computations  were  made  at  =  .98  and  a  =  0.  The  free  stream 
Reynolds  number  based  on  the  total  length  is  4.56  *  10b.  For  turbulent  flow 
computations,  an  algebraic  turbulence  model  by  Baldwin  and  Lomax11  is  used. 
Computations  are  started  from  initial  freestream  conditions  and  are  marched  in 
time  to  obtain  the  steady  state  solution.  Figures  6  and  7  show  the  pressure 
contours  and  Mach  contours,  respectively  for  a  converged  solution  obtained 
with  the  switching  dissipation  model.  These  figures  show  the  qualitative 
features  of  the  flow  such  as  the  expansions  at  the  cylinder  and  boattail 
corners  as  well  as  the  location  of  the  shock  wave  that  exist  on  the 
projectile. 

The  next  set  of  figures  show  the  surface  pressure  coefficient  as  a 
function  of  axial  position.  The  experimental  result  is  indicated  by  circles 
whereas  the  lines  represent  the  computed  results.  Figure  8  shows  the  time 
history  of  the  solution  at  various  time  iterations  using  the  old  fourth  order 
dissipation  model.  The  expansions  at  the  cylinder  and  boattail  corners  are 
clearly  seen  in  the  results  and  are  quickly  set  in  about  800  iterations.  The 
convergence  is  slowest  near  the  shock  wave  on  the  cylindrical  portion  of  the 
projectile  and  takes  a  large  number  of  time  iterations  for  the  solution  to 
converge.  Figure  9  shows  the  result  obtained  with  the  new  switching  dissipa¬ 
tion  model  and  varying  time  step  procedure  at  400  iterations.  Although,  this 
result  has  not  converged,  the  solution  agrees  fairly  well  with  the  previous 
converged  solution  from  Figure  8.  The  final  converged  solution,  shown  in 
Figure  10,  is  obtained  after  600  times  Iterations  with  the  improved  version  of 
the  code.  This  result  is  in  excellent  agreement  with  the  experimental  data. 
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The  smoothing  coefficients  eg  and  €d  used  in  the  switching  dissipation  model 
are  .01  and  1.0,  respectively  for  this  result. 

The  effect  of  these  smoothing  parameters  on  the  numerical  solution  was 
investigated.  First,  the  fourth  order  smoothing  coefficient,  ed  was 

changed.  The  result  is  shown  in  Figure  11  and  the  effect  of  varying  ed  was  to 

change  the  solution  very  minimally  near  the  shock  wave  (X/D  *  4.5).  The 
overall  accuracy  of  the  solution  is  fairly  good.  Second,  fourth  order 
coefficient,  eg  v/as  also  varied,  while  the  ratio  of  the  two  smoothing 

parameters  was  kept  fixed.  Again,  the  results  do  not  show  any  significant 
change  in  the  pressure  distribution. 

VI.  CONCLUDING  REMARKS.  The  original  time-dependent  Azimuthal-Invariant 
Navier-Stokes  code  has  been  modified  by  including  switching  dissipation  model 
and  variable  time  stepping  procedure.  This  improved  version  of  the  code  was 
used  to  compute  the  flow  over  a  projectile  at  =  .98  and  a  =  0. 

Significant  improvements  in  the  convergence  characteristics  have  been 
obtained  with  the  improved  version  of  the  code  for  steady  state  applications. 
The  total  CPU  time  has  been  reduced  by  a  factor  of  three  to  obtain  the 
converged  result.  In  addition,  the  code  is  now  more  robust  and  is  being  used 
presently  for  other  numerical  calculations. 
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Figure  2.  Pressure  Coefficient  with  Second  Order  Dissipation 
Near  Shock  (Reference  3) 
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SOCBT 


Figure  3. 


Model  Geometry  -  SOCBT  Projectile 
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Flsure  5.  Expanded  View  of  the  Grid  Near  the  Projectile 
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figure  7.  Mach  Contours,  =  .98,  a  =  0 
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Figure  8.  Longitudinal  Surface  Pressure  Distribution 
Mm  =  .98,  a  =  0  (Old  Dissipation  Model) 


Figure  12.  Effect  of  Fourth-Order  Smoothing  (ee)  on  Surface  Pressure 
distribution,  M  ■  .98,  a  *  0 


NUMERICAL  SOLUTION  OF  SYSTEMS  OF  PARTIAL 
DIFFERENTIAL  EQUATIONS 


Richard  E.  Ewing 

Departments  of  Mathematics  and  Petroleum  Engineering 

University  of  Wyoming 
Laramie,  Wyoming  82071 

ABSTRACT.  Complex  physical  phenomena  involving  transport  of  heat  or 
fluids  are  often  modeled  by  coupled  systems  of  nonlinear  partial  differential  equa¬ 
tions.  The  recent  advances  in  computational  capabilities  with  the  advent  of  new 
computer  architectures  have  allowed  the  incorporation  of  more  physics  into  the 
models  resulting  in  larger,  more  complex  mathematical  models.  Research  must 
therefore  be  increased  in  each  phase  of  the  modeling  process  utilizing  physical, 
mathematical,  numerical,  and  computational  concepts.  Transport  dominated  pro¬ 
cesses  are  notoriously  difficult  to  treat  numerically.  Techniques  for  treating  sys¬ 
tems  of  transport  equations  via  a  modified  method  of  characteristics  are  presented. 
The  flux  or  fluid  velocity  is  very  important  to  the  flow  directions  required  for  the 
modified  method  of  characteristics;  mixed  finite  element  techniques  for  obtaining 
accurate  fluid  velocities  are  presented.  Also  many  physical  phenomena  have  highly 
local  properties  which  may  move  with  time.  Adaptive  grid  refinement  methods 
are  presented  to  resolve  this  important  dynamic  local  behavior.  Finally,  the  in¬ 
fluence  of  the  computer  architecture  upon  the  development  of  efficient  algorithms 
for  large  scale  problems  is  discussed. 

1.  INTRODUCTION.  The  need  for  the  study  and  use  of  mathematics  is 
growing  and  expanding  extremely  rapidly  in  response  to  the  enormous  recent 
development  of  computing  capabilities.  The  use  of  complex  models  which  incor¬ 
porate  more  detailed  physics  has  necessitated  more  sophisticated  mathematics  in 
the  modeling  process.  In  this  way  a  broader  range  of  mathematics  is  needed  for 
applications.  In  this  paper,  we  shall  discuss  certain  aspects  of  the  expanding  scope 
of  mathematical  modeling. 

The  mathematical  techniques  which  are  used  to  model  multicomponent  or 
multiphase  flow  problems  are  representative  of  those  needed  for  many  other  ap¬ 
plications  such  as  chemically  reacting  or  thermally  driven  flows  and  will  be  used  to 
illustrate  the  role  of  mathematics  in  modeling.  The  advent  of  orders-of-magnitude 
better  computing  capabilities  has  allowed  the  modeling  of  more  complicated  phys¬ 
ical  phenomena.  We  will  indicate  how  this  growth  is  changing  the  entire  modeling 
process. 

Modeling  of  large-scale  physical  processes  involves  four  major  interrelated 
stages.  First,  a  physical  model  of  the  physical  processes  must  be  developed  incor¬ 
porating  as  much  physics  as  is  deemed  necessary  to  describe  the  essential  phenom¬ 
ena.  A  careful  list  of  the  assumptions  made  in  establishing  this  physical  model 
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should  be  compiled  together  with  expected  properties  of  the  process,  such  as  bi¬ 
furcation  or  physical  instabilities,  that  might  be  expected  and  should  be  modeled. 
Second,  a  mathematical  formulation  of  the  physical  model  should  be  obtained, 
usually  involving  coupled  systems  of  nonlinear  partial  differential  equations.  The 
properties  of  this  mathematical  model,  such  as  existence,  uniqueness,  and  reg¬ 
ularity  of  the  solution  are  then  obtained  and  related  to  the  physical  process  to 
check  the  model.  This  part  of  the  modeling  process  becomes  exceedingly  diffi¬ 
cult  with  large  coupled  systems  of  nonlinear  partial  differential  equations.  Third, 
a  discretized  numerical  model  of  the  mathematical  equations  is  produced.  This 
numerical  model  must  have  the  properties  of  accuracy  and  stability  and  produce 
solutions  which  represent  the  basic  physical  features  as  well  as  possible  without 
introducing  spurious  phenomena  associated  with  the  specific  numerical  schemes. 
Obtaining  asymptotic  error  estimates  via  mathematical  analysis  for  the  systems 
of  equations  is  critical  for  accurate  numerical  simulation.  Fourth,  a  computer 
program  capable  of  efficiently  performing  the  necessary  computations  for  the  nu¬ 
merical  model  is  sought.  Properties  of  the  computer  architecture  to  be  used  in  the 
computation  must  be  considered  strongly  in  the  development  of  efficient  compu¬ 
tational  algorithms.  Although  the  total  modeling  process  encompasses  aspects  of 
each  of  these  four  intermediate  stages,  the  process  is  not  complete  with  one  pass 
through  the  steps.  Usually  several  iterations  of  this  modeling  loop  are  necessary 
to  obtain  reasonable  models  for  the  highly  complex  physical  phenomena  involved 
in  many  applications. 

The  aims  of  this  paper  are  to  introduce  certain  complex  physical  phenomena 
which  need  to  be  better  understood,  to  illustrate  aspects  of  the  modeling  process 
used  to  describe  these  processes,  and  to  discuss  some  of  the  newer  mathematical 
tools  that  are  being  utilized  in  the  various  models.  The  complexity  of  the  models 
requires  sophisticated  mathematical  analysis.  For  example,  the  increasing  use  of 
large,  coupled  systems  of  nonlinear  partial  differential  equations  to  describe  the 
flow  of  multiphase  and  multicomponent  fluid  systems  is  identifying  very  difficult 
problems  in  the  theoretical  aspects  of  the  partial  differential  equations,  the  numer¬ 
ical  analysis  of  various  discretization  schemes,  the  development  of  new,  accurate 
numerical  models,  and  the  computational  efficiency  of  discrete  systems  resulting 
from  the  discretizations.  The  interplay  between  the  engineering  and  physics  of 
the  applications,  the  mathematical  properties  of  the  models  and  discretizations, 
and  the  role  of  the  computer  in  the  algorithm  development  is  critical  and  will  be 
stressed  in  this  presentation. 

The  modeling  of  many  fluid  flow  problems  involves  very  similar  mathemati¬ 
cal  equations.  Examples  of  mathematical  and  related  physical  properties  of  these 
models  which  must  be  addressed  include:  (a)  the  resolution  of  sharp  moving  fronts 
in  convection  dominated  convection-diffusion  problems,  (b)  the  stability  and  ac¬ 
curacy  of  discretization  of  highly  non-self-adjoint  differential  operators,  (c)  the 
need  to  have  very  accurate  fluid  velocities  which  dominate  the  flow,  (d)  the  need 
to  model  dynamic  local  phenomena  which  govern  the  physics,  and  (e)  the  empha- 
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sis  on  development  of  efficient  numerical  procedures  for  the  enormous  problems 
encountered. 

A  model  problem  which  illustrates  many  major  numerical  difficulties  arising 
in  fluid  flow  applications  is  presented  in  Section  2.  The  numerical  stability  prob¬ 
lems  associated  with  this  transport  dominated  system  and  the  corresponding  pure 
transport  problem  are  discussed.  A  modified  method  of  characteristics  based  on 
combining  the  transport  and  accumulation  terms  in  the  equation  into  a  directional 
derivative  along  characteristic-like  curves  is  then  briefly  described.  The  modified 
method  of  characteristics  is  heavily  dependent  upon  having  very  accurate  fluid 
velocities.  Section  3  is  then  devoted  to  the  description  of  a  mixed  finite  element 
procedure  which  is  designed  to  give  approximations  of  the  fluid  velocities  which 
are  just  as  accurate  as  the  pressure  approximations,  even  in  the  context  of  rapidly 
changing  fluid  properties.  The  interaction  between  the  computational  efforts  and 
associated  error  estimates  is  important.  The  need  for  adaptive  local  grid  refine¬ 
ment  methods  to  resolve  certain  dynamic,  highly  localized  physical  phenomena  is 
described  in  Section  4.  Important  considerations  such  as  a  choice  of  versatile  and 
efficient  data  structures  and  adaptivity  techniques  are  discussed. 


2.  DESCRIPTION  OF  A  MODEL  PROBLEM  AND  THE  MODIFIED 
METHOD  OF  CHARACTERISTICS.  A  model  system  of  equations  describing 
multicomponent  flow  of  an  incompressible  fluid  [20,31,39]  is  given  by 


vu=-v'iiv'’=«' 

x  €  n,  t  e  J 

(i) 

dc 

<f>—  +  V  •  [uc  -  D(u)Vc]  =  ffg, 

x 6 n,  teJ 

(2) 

u  •  n  =  (uc  -  D(u)Vc]  ■  n  =  0, 

xean,  teJ 

(3) 

c(x,0)  =  c0(x), 

x  €  n 

(4) 

for  fl  €  1R2  with  boundary  dfl  and  J  =  [0,  T],  where  p  and  u  are  the  pressure  and 
velocity  of  the  single  phase  fluid  mixture,  c  is  the  concentration  of  one  component, 
q ,  the  total  volumetric  flow  rate,  is  smoothly  distributed  over  0,  and  D  is  assumed 
to  be  a  diffusion-dispersion  tensor  given  by  [20,28,39] 


(Dij(x,  u)) 


<f>dmi+ri  (  u* 

|u|  \Uitt3 


«lU2\  +*t_f  u2 
u\  )  |u|  \-UiU2 


— Ui u2  \ 

«?  ) 


(5) 


where  u  =  (ui,u2),  |u|  is  the  Euclidean  norm  of  u,  dm  is  the  molecular  diffu¬ 
sion  coefficient,  and  di  and  dt  are  the  magnitudes  of  longitudinal  and  transverse 
dispersion.  For  many  multiphase  or  multicomponent  fluid  flow  problems,  Equa¬ 
tion  (1)  would  be  replaced  by  some  compressible  or  incompressible  form  of  the 
Navier-Stokes  equations  of  flow.  The  given  form  of  Equation  (1)  results  from  an 
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averaging  of  the  Navier-Stokes  equations  and  is  applicable  in  the  context  of  flow 
through  porous  media.  Equation  (1)  will  be  used  for  simplicity  of  exposition  in 
this  paper.  In  this  application,  ^  and  k  are  media  properties  and  fi  is  the  fluid  vis¬ 
cosity.  Extensions  of  the  concepts  and  techniques  presented  here  to  Navier-Stokes 
problems  are  possible. 

Equation  (2)  is  an  example  of  a  transport  dominated  convection-diffusion 
equation.  Since  diffusion  is  small,  the  solution  c  exhibits  very  sharp  fronts  or  con¬ 
centration  gradients  which  move  in  time  across  the  domain.  The  frontal  width  is 
very  narrow  in  general,  but  must  be  resolved  accurately  via  the  numerical  method 
since  it  describes  the  physics  of  the  mixing  zone  and  governs  the  speed  of  the 
frontal  movement.  Similar  dispersive  mixing  zones  are  critical  in  the  modeling  of 
contaminant  transport  processes  [1,2,25],  combustion  problems,  and  other  appli¬ 
cations  with  moving,  internal  fluid  interfaces. 

If  the  dispersion  tensor  in  Equation  (2)  is  ignored,  Equation  (2)  becomes 
a  first  order  hyperbolic  problem  instead  of  a  transport  dominated  convection- 
diffusion  equation.  Standard  highly  accurate  finite  difference  schemes  for  hyper¬ 
bolic  partial  differential  equations  are  known  to  be  unstable  and  various  upstream 
weighting  or  “artificial  diffusion”  techniques  have  been  utilized  to  stabilize  the 
variant  of  Equation  (2).  The  upstream  weighting  techniques  introduce  artificial 
diffusion  in  the  direction  of  the  grid  axes  and  of  a  size  proportional  to  the  grid 
spacings.  Thus,  although  this  stabilizing  effect  would  be  small  if  very  fine  grid 
block  spacings  were  used,  the  enormous  size  of  many  applications  necessitates  the 
use  of  large  grid  blocks  and  hence,  large,  directionally-dependent  artificially  in¬ 
duced  numerical  diffusion  which  has  nothing  to  do  with  the  physics  of  the  flow. 
Two  major  problems  in  numerical  flow  simulation  today  are  due  essentially  to  the 
use  of  standard  upstream  weighting  techniques.  First,  the  upstream  methods,  by 
introducing  a  large  artificial  numerical  diffusion  or  dispersion,  smear  sharp  fluid 
interfaces  producing  erroneous  predictions  of  the  degree  of  mixing  and  incorrect 
frontal  velocities.  Second,  the  numerical  diffusion  is  generated  along  grid  lines  and 
produces  results  which  may  be  radically  different  if  the  orientation  of  the  grid  is 
rotated  forty- five  degrees. 

The  use  of  physical  intuition  in  determining  a  more  accurate  numerical  scheme 
can  be  illustrated  in  this  case.  The  physical  diffusion-dispersion  term  displayed 
in  Equation  (5)  is  a  rotationally-invariant  tensor.  Therefore,  one  way  to  stabi¬ 
lize  the  first  order  hyperbolic  problem  without  introducing  artificial  directional 
effects  is  to  use  an  “artificial  diffusion”  term  of  the  form  in  Equation  (5).  The 
size  of  this  term  must  then  be  closely  considered  in  order  not  to  diffuse  fronts 
too  badly.  A  consequence  of  this  type  of  stabilization  with  finite  difference  dis¬ 
cretization  means  a  nine-point  difference  star  would  be  necessary  to  approximate 
the  cross-derivatives  accurately  instead  of  the  standard  five-point  star  used  in  two 
space  dimensions.  In  three  space  dimensions  a  twenty-seven  point  star  would  be 
necessary  to  replace  a  seven  point  star.  If  iterative  solution  techniques  are  being 
utilized,  this  greatly  increases  the  solution  times.  This  is  a  good  example  of  how 
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decisions  made  in  one  part  of  the  modeling  process  can  greatly  influence  other 
parts  of  the  problem. 

For  more  complex  physical  processes,  the  system  of  Equations  (1)— (4)  must  be 
expanded  to  include  mass  balances  from  different  components  or  different  phases. 
The  governing  equations  for  combustion  processes,  for  example,  could  involve 
coupled  systems  of  several  nonlinear  partial  differential  equations  of  the  form  of 
Equation  (2)  depending  upon  the  availability  of  oxygen,  fuel,  etc.  The  interaction 
between  these  coupled  nonlinear  equations  can  greatly  affect  the  properties  of  the 
equations.  Much  work  must  be  done  to  understand  the  mathematical  properties 
of  existence,  uniqueness,  and  continuous  dependence  of  solutions  upon  data  for 
coupled  systems  of  this  form.  Therefore,  the  improved  computing  capabilities 
which  allow  the  numerical  approximation  of  large,  coupled  systems  of  nonlinear 
partial  differential  equations,  are  necessitating  the  theoretical  study  of  proper¬ 
ties  of  systems  of  these  equations.  The  “applied”  mathematician  involved  in  the 
simulation  must  understand  and  be  able  to  work  with  these  “purer”  areas  if  the 
modeling  process  is  to  be  effective. 

The  numerical  analysis  involved  in  rigorously  obtaining  asymptotic  error  esti¬ 
mates  for  even  the  model  problem  presented  in  Equations  (l)— (5)  requires  various 
aspects  of  functional  analysis  and  approximation  theory.  The  order  of  the  approxi¬ 
mations  depends  upon  the  use  of  fractional  order  Sobolev  spaces  and  interpolation 
spaces.  Although  asymptotic  error  estimates  are  not  particularly  useful  in  obtain¬ 
ing  realistic  bounds  for  errors,  they  are  very  important  in  determining  which 
techniques  work  better  than  others  and  why.  The  analyses  involved  in  obtaining 
these  estimates  has  greatly  influenced  our  choice  of  numerical  schemes.  Similarly 
the  analysis  can  help  determine  where  special  locations  which  yield  supcrconver- 
gence  results  for  the  methods  can  be  found.  These  superconvergence  results  are 
especially  important  for  coupled  systems  of  partial  differential  equations  since  the 
locations  can  often  be  utilized  efficiently  in  quadrature  rules  to  describe  more  ac¬ 
curately  the  coupling  between  the  unknown  variables.  Asymptotic  error  estimates 
for  the  model  problem  appear  in  [28,30,31,32]. 

In  [19,37],  Douglas  and  Russell  described  a  technique  based  on  a  method 
of  characteristics  approach  for  treating  the  first  order  hyperbolic  part  of  Equa¬ 
tion  (2).  This  technique,  based  on  a  form  of  Equation  (2)  which  is  analogous  to  a 
convection-diffusion  equation,  was  implemented  by  Russell  [37,38]  and  forms  the 
basis  for  a  particular  time-stepping  scheme  which  we  have  used  effectively. 

In  order  to  introduce  a  nondivergence  form  of  Equation  (2)  that  is  used  in 
our  numerical  methods,  we  first  expand  the  cohvection  (V  •  uc)  term  with  the 
product  rule  and  use  Equation  (1)  to  obtain 

dc 

0^- +  u  •  Vc  -  V  •  [D(u)Vc]  =  (c  -  c)q,  x  €  H,  t  €  J  (6) 

where  £  =  max{q,0}.  To  avoid  technical  boundary  difficulties  associated  with  our 
modified  method  of  characteristics  for  Equation  (6),  in  this  exposition  we  assume 
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that  Q  is  a  rectangle  and  that  the  problem  given  by  Equations  (1),  (6),  (3),  and 
(4)  is  H-periodic. 

The  basic  idea  is  to  consider  the  hyperbolic  part  of  Equation  (6),  namely, 
<f>dc/dt  +  u  •  Vc,  as  a  directional  derivative.  Accordingly,  let  s  denote  the  unit 
vector  in  the  direction  of  (ui,u2,^)  in  n  x  If,  and  set 

V»(x)  =  (u!(x)2  +  u2(x)2  +  ^2)1/2.  (7) 

Then  Equation  (6)  can  be  rewritten  in  the  form 

dc 

rp—-V-(DVc)  +  qc  =  qe.  (8) 

as 

Note  that  the  spatial  operator  in  Equation  (8)  is  now  self-adjoint,  symmetric  ma¬ 
trices  will  result  from  spatial  discretization,  and  the  associated  numerical  methods 
will  be  better  behaved.  Since  iterative  solution  techniques  are  used  to  solve  the 
nonlinear  equations  resulting  from  finite  element  discretization  of  Equation  (8), 
and  since  symmetry  is  very  important  in  any  of  the  useful  conjugate  gradient 
iterative  solvers,  this  change  to  symmetric  matrices  is  very  important. 

One  critical  aspect  of  the  modified  method  of  characteristics  is  the  need  for 
accurate  approximation  of  the  directional  derivative  de/d s.  Many  methods  based 
upon  characteristics  fix  a  grid  at  time  tn~1  and  try  to  determine  where  these 
points  would  move  under  the  action  of  the  characteristics.  These  “moving  point” 
or  “front  tracking”  methods  must  then  discretize  Equation  (6)  and  solve  for  the 
unknowns  cn  on  a  mesh  of  irregular  or  unpredictable  nature.  If  too  large  a  time- 
step  is  chosen,  serious  difficulties  can  arise  from  the  spatial  and  temporal  behavior 
of  the  characteristics.  Front-tracking  in  two  space  dimensions  is  difficult  while  in 
three  dimensions,  it  is  considerably  more  difficult.  For  details  of  the  discretization 
of  dc/ds  and  the  ideas  for  extending  this  method  to  higher  space  dimensions,  see 
[29,30]. 

3.  MIXED  FINITE  ELEMENTS  FOR  PRESSURE  AND  VELOCITY. 
Since  both  the  modified  method  of  characteristics  and  the  diffusion-dispersion 
term  in  Equation  (6)  are  governed  by  the  fluid  velocity,  accurate  simulation  re¬ 
quires  an  accurate  approximation  of  the  velocity  u.  The  coefficients  k  and  n  in 
Equation  (1)  can  change  rapidly  in  space.  In  this  case,  in  order  for  the  flow  to 
remain  relatively  smooth,  the  pressure  changes  extremely  rapidly.  Thus  standard 
procedures  of  solving  Equation  (1)  as  an  elliptic  partial  differential  equation  for 
pressure,  differentiating  or  differencing  the  result  to  approximate  the  pressure 
gradient,  and  then  multiplying  by  the  rapidly  changing  function  k/p  can  produce 
very  poor  approximations  to  the  velocity  u.  In  this  section  a  mixed  finite  element 
method  for  approximating  u  and  p  simultaneously,  via  a  coupled  system  of  first 
order  partial  differential  equations,  will  be  discussed.  This  formulation  accurately 
treats  the  problem  of  rapidly  changing  flow  properties. 
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The  coupled  system  of  first  order  equations  used  to  define  our  methods  arise 
from  Darcy’s  Law  and  conservation  of  mass 


k 

u  =  --Vp, 

x  €  fl, 

(9) 

V  •  u  =  q, 

x  e  fl, 

(10) 

subject  to  the  boundary  condition 

u  •  n  =  0, 

x  e  dr 1. 

(11) 

Clearly  Equations  (9)— (11)  will  determine  p  only  to  within  an  additive  constant. 
Thus  a  normalizing  constraint  such  as  /n  p(x)  dx  =  0  or  p(xa)  =  0  for  some  x«  €  fl 
is  required  in  the  computation  to  prevent  a  singular  system. 

We  next  define  certain  function  spaces  and  notation.  Let  W  —  L2(Cl)  be  the 
set  of  all  functions  on  fl  whose  square  is  finite  integrable.  Let  /f  (div;  fl)  be  the 
set  of  vector  functions  v  €  [L2(fl)[2  such  that  such  V  •  v  €  L2(fl)  and  let 

V  =  /f(div;fl)  n  {v  •  n  =  0  on  3D}.  (12) 

Let  (v,u;)  =  fnvwdx,  (v,w)  =  Jdnwvds,  and  ||t/||2  =  (v,v)  be  the  standard 
L2  inner  products  and  norm  on  fl  and  dfl.  We  obtain  the  weak  solution  form  of 
Equations  (9)— (11)  by  dividing  each  side  of  Equation  (9)  by  k/p,  multiplying  by 
a  test  function  v  €  V,  and  integrating  the  result  to  obtain 

(*U’V)  =  (p,Vv)’  veV'  (13) 

The  right-hand  side  of  Equation  (13)  was  obtained  by  further  integration  by  parts 
and  use  of  Equation  (12).  Next,  multiplying  Equation  (10)  by  w  €  W  and  inte¬ 
grating  the  result,  we  complete  our  weak  formulation,  obtaining 

( V  •  u,  u;)  =  (9, «;)  tv  €  W.  (14) 

For  a  sequence  of  mesh  parameters  h  >  0,  we  choose  finite  dimensional 
subspaces  Vh  and  Wh  with  Vh  C  V  and  W *  C  W  and  seek  a  solution  pair 
(Ufc;P*)  eVhxWh  satisfying 

(^Ufc,v*)  -  (Pfc,divvfc)  =  0,  Vfc  €  Vh,  (15) 

(div Ufc,tu/i)  =  ( q}wh ),  wh  G  Wh.  (16) 

We  can  now  complete  the  description  of  our  mixed  finite  element  methods  with 
a  discussion  of  particular  choices  of  Vh  and  Wh-  Examples  of  these  spaces  are 
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presented  in  [33].  For  problems  with  smooth  coefficients  and  smooth  forcing  func¬ 
tions,  standard  approximation  theory  results  show  that,  by  using  higher  order 
basis  functions,  correspondingly  higher  order  convergence  rates  can  be  obtained 
[17,18]. 

Special  choices  of  basis  functions  for  the  Raviart-Thomas  spaces  [35]  based 
upon  Gauss-point  nodal  functions  and  related  quadrature  rules  have  significantly 
aided  in  the  computational  efficiency  of  these  methods.  For  detailed  descriptions 
of  these  bases  and  computational  results,  see  [14,26,32].  The  observed  conver¬ 
gence  rates  matched  those  predicted  in  [17,18].  Also  superconvergence  results 
were  obtained  at  specific  locations  which  can  be  utilized  in  quadrature  and  re¬ 
duced  quadrature  considerations  in  the  coupled  systems  described  in  Section  2. 

Since  the  set  of  equations  (9)-(ll)  will  only  determine  the  pressure  to  within 
an  arbitrary  constant,  the  algebraic  system  arising  from  our  mixed  method  system 
(23)-(25)  is  not  definite  unless  constants  are  modded  out  of  the  approximating 
space  for  pressures.  If  the  unknowns  for  the  x  and  y  components  of  the 
velocity  are  formally  eliminated  from  the  resulting  system,  one  can  obtain  a  set 
of  equations  for  the  pressure  variable.  The  matrix  arising  in  this  problem  is  quite 
complex,  but  is  comparable  to  a  matrix  generated  by  finite  difference  methods 
for  the  pressure  [39].  Preconditioned  conjugate  gradient  iterative  procedures  have 
been  developed  to  efficiently  solve  this  set  of  linear  equations  [26,41]. 

Techniques  for  coupling  the  mixed  finite  element  procedures  with  a  modified 
method  of  characteristics  for  the  concentration  in  Equations  (2)-(6)  have  appeared 
in  the  literature  [23,29].  Asymptotic  error  estimates  and  convergence  rates  for  this 
coupled  procedure  also  appeared  in  [30]. 

4.  ADAPTIVE  LOCAL  GRID  REFINEMENT.  Many  of  the  chemical  and 
physical  phenomena  which  govern  chemically  reacting  or  thermally  driven  flow 
processes  have  extremely  important  local  properties.  Thus  the  models  used  in 
computer  codes  for  these  problems  must  be  capable  of  resolving  these  critical  lo¬ 
cal  features.  Also,  in  order  to  be  useful  in  large-scale  dynamic  codes,  these  models 
must  be  self-adaptive  and  extremely  efficient.  The  development  of  adaptive  grid 
refinement  techniques  must  take  into  account  the  rapid  development  of  new,  ad¬ 
vanced  computer  architectures.  The  compatibility  of  adaptive  mesh  modification 
algorithms  with  the  intended  computer  is  a  critical  consideration  in  the  algorithm 
development. 

The  flexibility  to  dynamically  change  the  number  of  grid  points  and  thus  the 
number  of  unknowns  in  a  problem  can  create  difficulties  in  the  linearization  and 
linear  solution  algorithms.  In  particular,  it  is  extremely  difficult  to  vectorize  codes 
with  changing  numbers  of  unknowns  for  efficient  solution  on  vector  machine  archi¬ 
tectures.  The  ability  to  have  truly  local  refinement  and  derefinement  capabilities 
necessitates  the  use  of  a  fairly  complex  data  structure.  A  data  structure  with 
these  properties  has  been  developed  and  was  described  in  the  literature  [15,21- 
24].  It  is  a  multilinked  list  which  utilizes  various  properties  of  the  tree  structures 
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presented  in  [36]  and  [7,8],  The  structure  allows  efficient  linear  solution  algorithms 
via  tree-traveling  techniques.  Although  these  algorithms  are  extremely  difficult 
to  vectorize  effectively,  the  tree  structure  lends  itself  well  to  parallelism  at  many 
levels.  Development  of  codes  for  efficient  use  of  MIMD  (Multiple  Instruction  Mul¬ 
tiple  Data  Stream)  afchitectures  to  parallelize  the  local  grid  refinement  algorithms 
based  upon  a  multiple  linked-list  data  structure  are  under  way  [16].  Preliminary 
experience  with  the  Denelcor  HEP,  an  MIMD  machine,  in  this  context  has  been 
educational.  Research  in  parallelization  of  these  techniques  will  be  continued  on 
a  new  Hypercube,  obtained  through  DoD  funding. 

For  truly  general  local  refinement  a  complex  data  structure  like  those  dis¬ 
cussed  above  and  associated  complications  to  the  code  are  necessary.  If  local 
refinement  is  only  needed  in  a  very  few  special  points,  a  technique  termed  patch 
refinement  may  be  an  attractive  alternative.  These  concepts  do  not  require  as 
complex  a  data  structure  but  do  involve  ideas  of  passing  information  from  one 
uniform  grid  to  another.  Berger  and  Oliger  have  been  using  patch  refinement 
techniques  for  hyperbolic  problems  using  finite  difference  discretizations  for  some 
time  [9,10]. 

The  idea  of  a  local-patch  refinement  method  is  to  pick  a  patch  that  includes 
most  of  the  critical  behavior  around  a  region  with  important  local  properties  and 
do  a  much  finer,  uniform  grid  refinement  within  this  patch.  Given  a  uniform  fine 
grid,  very  fast  solvers  can  be  applied  locally  in  this  region  using  boundary  data 
from  the  coarse  original  grid.  McCormick,  Thomas  and  co-workers  have  used 
multigrid  techniques  to  solve  the  fine-grid  problem  in  a  simple  elliptic  model  [27]. 
They  have  addressed  the  communication  problem  with  the  coarse  grid  and  have 
attained  conservation  of  mass  on  their  “composite  grid.”  Extensions  of  their  tech¬ 
nique,  termed  FCOM  (fast  composite  mesh  method),  to  more  difficult  problems 
are  planned. 

Bramble,  Pasciak  and  Schatz  [10-12]  have  developed  some  efficient  gridding 
and  preconditioning  techniques  which  can  also  be  used  in  the  local-patch  refine¬ 
ment  framework.  Their  methods  have  logically  rectangular  grids  within  the  patch 
which  can  be  solved  very  rapidly  via  FFT  preconditioners.  The  important  problem 
is  the  communication  between  grids.  Recent  work  by  Bramble,  Pasciak,  Schatz 
and  Ewing  [13]  uses  preconditioning  for  local  grid  refinement  in  a  way  to  make  im¬ 
plementation  of  the  methods  in  existing  large  scale  simulators  an  efficient  process. 
These  techniques,  based  upon  finite  element  preconditioners,  could  help  produce 
a  major  advancement  in  incorporating  fixed  local  refinement  methods  in  a  wide 
variety  of  applications  and  existing  codes.  We  also  feel  that  these  methods  are 
sufficiently  powerful  to  handle  local  time-stepping  applications  as  well. 

The  adaptivity  of  the  local  refinement  methods  must  be  driven  either  by  a 
type  of  “activity  index,”  which  relays  rapid  changes  in  solution  properties,  or  by 
some  estimate  of  the  errors  present  in  different  spatial  locations  which  need  to 
be  reduced.  Recently,  locally-computable  a  posteriori  error  estimators  have  been 
developed  by  BabuSka  and  Rheinboldt  [3-5],  Bank  [6],  Weiser  [40],  and  Oden  [34]. 
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Under  suitable  assumptions,  these  error  estimators  converge  to  the  norm  of  the 
actual  error  as  the  mesh  size  tends  to  zero.  These  a  posteriori  error  estimators  are 
extremely  important  for  problems  involving  elliptic  partial  differential  equations  in 
determining  the  reliability  of  estimates  for  a  fixed  grid  and  a  fixed  error  tolerance 
in  a  given  norm.  The  error  estimators  are  used  to  successively  refine  locally 
until  the  errors  in  a  specified  norm  are,  in  some  sense,  equilibrated.  Although 
these  methods  are  very  effective  for  elliptic  problems,  they  are  not  efficient  for 
large  time-dependent  problems  where  an  “optimal”  mesh  at  each  time  step  is  not 
“optimal”  for  the  entire  time-dependent  problem. 

For  hyperbolic  or  transport  dominated  parabolic  partial  differential  equa¬ 
tions,  sharp  fronts  move  along  characteristic  or  near-characteristic  directions. 
Therefore  the  computed  velocity  determines  both  the  local  speed  and  direction 
of  the  regions  where  local  refinement  will  be  needed  at  the  next  time  steps.  This 
information  should  be  utilized  to  help  move  the  local  refinement  with  the  front. 
Although  patch  refinement  techniques  based  upon  characteristic-direction  adapta¬ 
tion  strategies  do  not  determine  a  “locally  optimal”  grid,  the  waste  in  using  more 
grid  than  necessary  is  compensated  for  by  the  overall  efficiency.  Use  of  a  larger 
refined  area  and  grid  movement  only  after  several  time-steps  is  the  technique  that 
we  are  developing  since  efficiency  is  crucial  in  large-scale  reservoir  simulation. 

Variable  coefficients  in  the  partial  differential  equations  significantly  compli¬ 
cate  local  refinement  techniques  for  finite  difference  methods.  At  present,  tech¬ 
niques  for  weighting  the  finite  difference  stars  based  upon  the  varying  coefficient 
values  seem  to  be  “ad  hoc”  and  can  often  cause  serious  errors  in  the  flow  de¬ 
scription.  Local  refinement  techniques  with  finite  element  methods  always  yield 
a  straightforward  way  to  evaluate  and  weight  the  coefficients  and  are,  in  general, 
much  easier  to  apply.  Thus  the  versatility  of  variational  techniques  often  more 
than  compensates  for  the  slight  addition  in  computational  complexity  of  finite 
element  methods. 
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ABSTRACT.  We  present  sufficient  conditions  for  the  asymptotic  stability  of  steady 
solutions  of  icitial-boundary  value  problems  for  parabolic  conservation  laws  that  model 
one-dimensional  transonic  flow  in  ducts  of  variable  cross-sectional  area.  The  stability 
conditions  consist  of  the  standard  (Lax)  entropy  conditions  for  the  associated  hyperbolic 
system  and  a  local  dissipativity  condition  on  the  source  terms.  In  the  context  of  the  duct 
flow  problem  the  latter  is  a  geometric  condition  that  guarantees  the  asymptotic  stability 
of  the  standing  shock  wave  known  to  exist  in  the  diverging  portion  of  the  duct. 

I,  INTRODUCTION.  The  problem  of  determining  the  stability  (or  instability)  of 
steady  solutions  of  the  equations  of  motion  for  a  viscous,  heat-conducting  fluid  has 
attracted  the  attention  of  many  physical  scientists  and  applied  mathematicians  since  the 
basic  equations  were  written  down  in  the  last  century.  In  this  paper  we  study  the 
asymptotic  stability  of  steady  shock-layer  solutions  of  hyperbolic  systems  of  conservation 
laws  to  which  have  been  added  formally  small  terms  representing  the  effects  of  viscosity 
and  heat  conduction.  Our  approach  is  to  employ  the  usual  inviscid  entropy  conditions 
(which  are,  of  course,  stability  conditions  [5])  in  conjunction  with  a  condition  on  the 
source  terms  in  a  neighborhood  of  the  actual  viscous  layer,  in  order  to  estimate  the  size 
of  an  initial  perturbation.  Inside  the  layer  we  also  make  use  of  some  asymptotic 
estimates  on  solutions  of  a  steady  equation  resulting  from  the  balance  between  inertial 
and  viscous  forces.  We  are  able  to  show  that  under  the  stated  assumptions  such  a  steady 
solution  is  asymptotically  stable  with  respect  to  all  sufficiently  small  perturbations  in  the 
initial  data. 

This  study  was  motivated  by  the  recent  papers  [7,8],  [2]  which  are  concerned,  in 
part,  with  the  stability  properties  of  standing  transonic  shock  waves  in  the  flow  of  an 
ideal  gas  through  a  duct  of  variable  cross-sectional  area.  We  therefore  discuss  in  Section 
III  how  our  result  for  the  general  problem,  when  applied  to  the  gasdynamic  model, 
guarantees  the  asymptotic  stability  of  any  standing  viscous  shock  wave  located  in  a 
diverging  portion  of  the  duct.  As  a  prelude,  we  treat  rather  thoroughly  in  the  next  section 
an  instructive  scalar  problem  in  order  to  illustrate  our  approach  in  a  simple  setting. 

II.  A  MODEL  PROBLEM.  Consider  the  following  problem 

ut  +  uux  "  r^u  “  cuxx’  0  <  x  <  1.  t  >  0. 

u(x,0,c)  *  <p(x,e) ,  x  in  [0,1] ,  (2.1) 

u(0,t,c)  -  a,  u(l,t,c)  *  (J,  t  in  [0,®), 

where  r,  <p  are  smooth  functions,  c  is  a  small  positive  parameter  and  a,|5  are  constants;  cf. 
[4].  The  inviscid  (c  =  0)  version  of  (2.1)  was  considered  by  Embid  et  al.  [2].  It  is  not 
difficult  to  see  that  the  corresponding  steady  problem 
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(2.2) 


cUxx  =  UUx‘r(x)U*0<x<1’ 


U(0,e)  ■  a,  U(l,c)  =  0  , 

has  shock-layer  solutions  U  *  U(x,e)  connecting  the  inviscid  branches  Ul(x):»R(x)  +  a  and 
Ur(x):=  R(x)  -  R(l)  +  (5,  for  R(x):«  Jgr(s)ds,  provided  Ul  and  Ur  satisfy  the  entropy 
condition 

UL(x)  >  0  >  URfx)  in  [0,11 ;  (2.3) 

cf.[4],  [2].  The  location  Xq  of  the  shock  layer  is  obtained  from  the  Rankine-Hugoniot 
relation  (Ul  +  Ur)  (x0)  ■  0,  that  is,  R(xq)  ■  [R(l)  -  a  -  |J]/2,  which  may  have  more  than 
one  solution  x0  in  (0,1),  depending  on  the  form  of  r. 

Let  us  now  test  the  stability  of  one  such  steady  solution  U(x,e),  having  a  shock  layer 
of  width  O(e)  at  Xq  in  (0,1),  by  introducing  the  perturbation  w(x,t,e):«  u(x,t,e)  -  U(x,e)  into 
(2.1).  The  perturbation  problem  so  obtained  is 

wt  +  [(U  +  w)2  -  U2Jx/2  -  r(x)w  =  cwxx  , 

(2.4) 

w(x,0,e)  =  «|»(x,e),  w(0,t,c)  =  w(l,t,c)  «  0  , 


where  i|»:»  <p  -  U  is  the  initial  perturbation.  Thus,  in  order  to  show  that  the  standing  shock 
U  is  an  asymptotically  stable  steady  state  of  (2.1)  it  is  enough  to  show  that  w  =  0  is  an 
asymptotically  stable  solution  of  (2.4).  We  begin  by  examining  first  a  small  (two-sided) 
neighborhood  A  of  Xq  and  noting  that  in  A  the  initial  value  problem 

cWx  =  [(U  +  W)2  -  U2]/2,  W(xQ,c)  >  |M|  „  ,  (2>5) 

has  a  positive  solution  W  =  W(x,c)  which  behaves  like  a  6-function  peaked  at  Xq.  More 
precisely,  we  have  that 


W(x,e)  -  0(W(x0,c)exp[-Y(x-x0)2/2c2]) ,  (2  6) 

for  a  known  positive  constant  y,  since  (2.5)  is  a  Bernoulli  equation  which  becomes  a  linear 
equation  in  the  variable  1/W.  Using  W  we  can  now  construct  a  barrier  function  for  (2.4) 
that  ensures  the  asymptotic  stability  of  w  *  0  in  A,  provided  we  assume  also  that 

r(x)  <  -  v  <  0  in  A  (2.7) 

for  a  positive  constant  v.  A  suitable  barrier  function  is 
Q(x,t,c):»  W(x,e)  e-^,  0  <  y  <  v  , 

since  -  fl(x,0,e)  <  Y(x,e)  <  fl(x,0,c),  Qt  +  $x(x,Q,e)  -  r(x)Q  -  cQxx  >  0  and  (— £2)t  +  $x(x,-G,e)  - 
r(x)(-G)  -  c(-Q)xx  <  0,  for  <Kx,w,c):=  Uw  +  w2/2.  To  verify  this  it  is  enough  to  note  that 
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Qt  +  [UQ  +  Q2/2]x  -  r(x)Q  -  e0xx 

-  e“pt{-viW  +  WW  e_>lt  -  WW  -  r(x)W} 
X  X 

>  We~pt{v  -  p  -  (1  -  e_Jlt)Wx} 

>0 


in  A  if  p  and  W(x0,e)  are  sufficiently  small,  since  Wx(x,c)  =  0(W(xo,e)  [|x-x0|/e  + 
WfXQ.eJl/e).  This  estimate  on  Wx  follows  directly  from  (2.6)  and  the  estimate  -U(x,e)  = 
0(|x-Xo|/c),  which  holds  in  A.  Consequently  in  a  small  neighborhood  of  the  shock  layer 
the  solution  of  (2.4)  satisfies 

|w(x,t,e)|  <  W(x,e)  e_>lt , 


and  so  w«0  (and  hence,  U)  are  asymptotically  stable  there  with  respect  to  all  sufficiently 
small  perturbations  of  the  initial  data. 

Away  from  the  layer  the  analysis  actually  simplifies  dramatically,  thanks  to  the 
entropy  condition  (2.3).  To  see  this  we  begin  by  recalling  some  basic  results  on  the 
asymptotic  stability  of  the  trivial  solution  of  the  general  problem 

wt  +  f(x,w,c)wx  +  g(x,w,c)  -  ew^,  0  <  x  <  1,  t  >  0  , 

(2.8) 

w(x,0,e)  -  t(x,c),  w(0,t,c)  -  w(l,t,e)  -  0  , 


where  g(x,0,e)  =  0;  cf.  [4]. 

Lemma  2.1.  Suppose  there  exists  a  positive  constant  m  such  that  for  (x,w,e)  in  <2>:= 
[0,1]  x  [-6,6]  x  (0,eo] 

(3g/8w)  (x,w,c)  >  m  >  0  , 

for  6  and  e0  positive  constants.  Then  the  solution  w  of  (2.8)  satisfies 
|w(x,t,e)|  <  ||'l'll00e”mt  in  [0,1]  x  [0,®)  x  (0,eoJ 

prQYid&d  M®  <  5- 

This  is  the  familiar  result  involving  "linearized"  stability,  and  it  follows  by  noting 
that  0(t):»  |MI»e”mt  is  a  barrier  function  for  (2.8).  The  next  result  imposes  a  strong 
condition  on  the  function  f  and  a  relatively  mild  one  on  g. 

Lemma  2.2.  Suppose  there  exist  positive  constants  k  and  ft  such  that  for  (x,w,c)  in  Q> 

|f(x,w,e)|  >  k  >  0  and  (3g/8w)  (x,w,e)  >  -  ft  . 
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Then  the  solution  w  of  (2.8)  satisfies 

|w(x,t,c)|  <  Le"nt  in  [0,1]  x  [0,®)  x  (0,zQ] 

provided  Moo  <  8.  Here  L:«||t||«(2e“*-  1)  and  n:-ft/(2e“*  -  1),  for  X:=  -  ft/k  +  0(e)  a 
negative  root  of  the  characteristic  polynomial  eX2  +  kX  +  ft. 

This  result  follows  because  the  positivity  of  |f  |  and  semiboundedness  of  gw  allow  us 
to  convert  the  equation  in  (2.8),  via  an  exponential  change  of  variable  (in  x),  into  an 
equivalent  equation  whose  g-part  satisfies  the  positivity  condition  in  Lemma  2.1;  cf.  [4]. 

Returning  to  the  perturbation  problem  (2.4)  away  from  the  shock  layer,  we  see  that 
by  virtue  of  the  entropy  condition  (2.3)  Lemma  2.2  applies  immediately,  since 

|f(x,w,e)|  =  |U(x,e)  +  w|  is  positive 


and 


|(9g/9w)  (x,w,e)|  =  |Ux(x,e)-r(x)| 

is  bounded  outside  of  A,  for  all  sufficiently  small  initial  perturbations.  We  conclude 
therefore  that  an  entropy-condition-satisfying  standing  shock  wave  solution  of  problem 
(2.1)  is  asymptotically  stable  provided  the  additional  condition  (2.7)  is  satisfied  in  a 
neighborhood  of  the  viscous  layer.  Embid  et  al.  [2]  have  shown  that  an  entropy-condition 
satisfying  inviscid  standing  shock  is,  in  fact,  unstable  if  this  negativity  assumption  is 
violated.  As  we  shall  see  in  the  next  section,  the  negativity  [positivity]  of  r  models  in  a 
simple  way  the  divergence  [convergence]  of  a  duct  containing  standing  transonic  shock 
waves. 

Ill.  THE  GASDYNAM1C  EQUATIONS.  In  this  final  section  we  give  a  heuristic 
analysis  of  the  asymptotic  stability  of  a  standing  viscous  shock  wave  in  the  diverging 
portion  of  a  variable-area  duct.  The  equations  that  model  the  transonic  flow  of  an 
inviscid,  non-heat-conducting  gas  in  a  duct  of  unit  length  written  in  divergence  form  are 
(cf.  [6;  Chap.  2],  [7,8]) 

pt +  (pu)x  +  c*x)  pu  "  0  * 

(pu)  +  (pu2  +  p)  +  c(x)  pu2  ■  0  ,  0  <  x  <  1 ,  t  >  0  , 

l  A 

(pE)t  +  (pEu  +  pu)x  +  c(x)  (pEu  +  pu)  -  0  , 

where  c(x):«  a'(x)/a(x),  for  a(x)  the  cross-sectional  area  of  the  duct,  and  p,u,p  and  E  are 
the  density,  velocity,  pressure  and  total  energy  of  the  gas,  respectively.  If  we  let  v:= 
(p,pu,pE)  denote  the  vector  of  conserved  densities,  f:«  (pu,pu2  +  p,pEu  +  pu)  the  vector  of 
fluxes  and  g:<-  (pu,pu2,pEu  +  pu),  then  we  can  write  these  equations  more  simply  as 

vt  +  f(v)x  +  g(y)  e  0  •  (3-D 
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If  we  now  assume  that  dissipative  effects  such  as  viscosity  (p)  and  thermal  conductivity 
(k)  are  small  but  nonzero,  then  we  must  add  to  the  righthand  sides  of  the  second  and  third 
equations  in  (3.1)  second-order  terms  proportional  to  p  and  k.  The  assumed  smallness  of 
these  coefficients  suggests  modelling  their  presence  by  adding  to  the  righthand  side  of 
(3.1)  the  formally  small  term  eBvx*  where  0<e«l  and  B  is  a  constant  matrix  whose 
eigenvalues  have  nonnegative  real  parts  (cf.[9;  Chap.  15]).  Thus  we  are  led  to  consider  the 
following  parabolic  initial-boundary  value  problem 

vt  +  f<v>x  +  =  eBvxx’  0<x<1,  t>0, 

v(x,0,e)  -  y(x,e),  x  in  [0,1] ,  (3.2) 

v(0,t,e)  «  vQ,  v(l,t,c)  -  Vj,  t  in  [0,®) , 

as  a  model  for  time-dependent,  one-dimensional  transonic  flow  in  a  variable-area  duct 
with  prescribed  supersonic  inlet  (x-0)  and  subsonic  outlet  (x=l)  values  of  p,  u  and  E. 

Let  us  focus  our  attention  now  on  the  steady-state  solutions  of  (3.2),  that  is,  on 
solutions  of  the  boundary  value  problem 

eBVxx*f(V)x  +  c(x)g(V),  0<x<l  , 

(3.3) 

V(0,e)-v0,  V(l,e)-v1  , 

as  e-0.  Under  various  simplifying  assumptions  this  problem  is  known  to  have  solutions  of 
shock-layer  type  in  regions  where  the  function  c  is  either  positive  or  negative;  cf.  [2],  [3]. 
Such  solutions  represent  standing  shock  wave  solutions  of  the  original  gasdynamic 
equations  (3.1)  either  with  structure  (if  c>0)  or  without  structure  (if  e=0).  In  particular, 
Embid  et  al.  [2]  give  rather  detailed  results  in  the  case  of  isentropic  flow  of  an  inviscid, 
non-heat-conducting  (e-0)  gas  through  the  study  of  an  algebraic  equation  (the  Hugoniot 
curve)  derived  from  solutions  of  the  first-order  system  f(V)x  +  c(x)g(V)  -  0.  They  prove 
the  existence  of  standing  shock  waves  in  both  the  converging  (c<0)  and  diverging  (c>0) 
portions  of  the  duct  that  satisfy  the  entropy  condition 

UL  "  SL  >  0  >  UR  ”  SR  ’ 

Here  s(:«(3p/3p)y*)  is  the  local  speed  of  sound  and  the  subscripts  L  and  R  denote  the 
limiting  values  of  the  variables  to  the  left  and  the  right  of  the  shock,  respectively.  In 
order  to  proceed  with  our  analysis  we  assume  that  the  problem  (3.3)  has  a  smooth  solution 
V  -  V(x,e)  as  e-0  representing  a  transonic  standing  wave  centered  at  xq  in  (0,1),  that  is, 

VL(x),  0  <  x  <  xQ  , 

lim  V(x,e)  - 

e-o 

VR(x) ,  xQ  <  x  <  1  , 
which  satisfies  the  entropy  condition 
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(3.4) 


UL(x)  >  sL(x) ,  0  <  x  <  xQ  ;  UR(x)  <  sR(x) ,  xQ  <  x  <  1  , 

for  U(x)  the  (e-0)  -  limit  of  u(x,e).  We  claim  that  such  a  shock  is  asymptotically  stable 
with  respect  to  all  sufficiently  small  initial  perturbations  provided  the  function  c  is 
positive  in  a  neighborhood  of  xq. 

To  convince  ourselves  of  this  let  us  begin  by  examining  briefly  the  same 
initial-boundary  value  problem  (3.2)  for  the  general  parabolic  system  of  n  equations 

vt +  F(v)x  +  G(x,v)  “  cBvxx  ’ 

under  the  basic  assumption  that  its  unperturbed  (c»0)  part  is  strictly  hyperbolic  and 
genuinely  nonlinear.  (Strict  hyperbolicity  means  that  the  eigenvalues  X(v)  of  the  Jacobian 
matrix  dF  are  real  and  distinct  for  all  v  of  interest,  say  Xj(v)<X2(v)< .  .  .  <Xn(v),  while 
genuine  nonlinearity  is  a  generalization  of  the  scalar  notion  of  convexity  to  vector 
functions;  cf.[5],  [9;Chap.  17].)  Suppose  that  the  corresponding  steady  boundary  value 
problem  has  a  smooth  solution  V  *  V(x,c)  with  a  shock  layer  at  xq  in  (0,1)  which  in  the 
limit  c-0  is  a  standing  k-shock  (l<k<n)  satisfying  the  Lax  entropy  conditions  (cf.[5],  [9; 
Chap.  15]) 

VVL>  >  •  •  •  >  W >0>  W  >  >  xi(V 

X1(VL)  *  •  •  *  *  Xk-1(VL)  <0<  VW  <  •  •  •  <  Xn(VR)  ■ 

If  we  now  introduce  the  perturbation  w:«v-V,  then  the  resulting  problem  for  w  is 
wt  +  {F(V+w)  -  F(V)}x  +  G(x,V+w)  -  G(x,V)  «  cBw^  , 

(3.6) 

w(x,0,c)  «  y(x,e)  :=  <p(x,e)  -  V(x,c),  w(0,t,e)  -  w(l,t,c)  »  0  , 

and  so  the  asymptotic  stability  of  V  implies  and  is  implied  by  the  condition  that 

lim  ||w||co=0.  Now  away  from  xq  these  entropy  conditions  imply  the  asymptotic  stability 
^-♦00 

of  V;  this  was  shown  by  Liu  [7,8]  for  the  gasdynamic  equations  (3.1).  Thus  we  have  only  to 
show  that  in  an  immediate  neighborhood  A  of  xq  the  perturbation  w  decays  to  0  as  t-®.  In 
order  to  accomplish  this  we  make  the  further  assumption  (cf.[  1  ])  that  the  function  G  is 
locally  "dissipative,"  in  the  sense  that  there  exists  a  positive  constant  m  such  that 

w  •  [G(x,V+w)  -  G(x,V)]  >  m||  w|| 2  in  A  , 

for  all  w  of  interest.  Let  us  now  proceed  as  in  the  previous  section  by  looking  for  a 
solution  in  A  of  the  initial  value  problem 

cBWx  =  F(V+W)  -  F (V),  ||W(x0,  c)||  >  |HL  •  (3.8) 

In  view  of  the  entropy  inequalities  (3.5)  the  problem  (3.8)  has  a  solution  W»W(x,c)  as  c-0 
such  that  W}>0  and  Wj*0(W}(xQ,c)  exp[*Yi(x-xo)2/2e2])  in  A  for  known  positive  constants 
Yi(l<i<n).  Note  that  in  contrast  to  the  solution  of  the  scalar  problem  (2.5)  the  solution  of 
the  vector  problem  has  components  which  behave  like  inverted  6-functions  as  well  as  like 
the  6-function  described  by  (2.6).  It  follows  that  each  component  of  the  gradient  of  W 
satisfies  an  estimate  in  A  of  the  form 
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(3.9) 


WitX(x*c)  -  0(W.(Xq,c)  [  |  x-Xq  |  / c  +  W.(Xq,c)]/c)  . 

Using  W  we  can  construct  the  vector  barrier  function 
G(x,t,e)  W(x,c)  e”*1*,  0<q<m  . 


Inserting  fl  into  (3.6)  gives  us 

-  qWe"*1*  +  {F(V  +  We"*11)  -  F(V))x  +  G(x,V  +  We"*11)  -  G(x,V)  =  cBW^e"*11 


or,  by  virtue  of  (3.8) , 

-  qWe"*1*  +  {F(V  +  We"*11)  -  F(V)}x  -  [F(V  +  WJe"*11  -  F(V)e"qt]x 
+  G(x,V  +  We"*11)  -  G(x,V)  =  0  . 


(3.10) 


Now 


{•}  -  dFfVJWe"*11  +  y,P(W,W)e"2qt  +  0(e"3qt) 
and 

[•]  -  dFfVJWe"*11  +  ^(W.WJe"*11  , 

where  P  and  0  are  bilinear  forms  representing  quadratic  terms,  and  so  the  i-th  component 
of  {»}x  -  [-lx  is  the  order  OOVjWj^xe"*!1)  in  A.  Thus  if  we  take  A  sufficiently  small  we 
see  from  the  estimates  (3.9)  that  the  gradient  terms  in  the  comparison  equation  (3.10) 
may  be  neglected  to  lowest  order  relative  to  the  other  three  terms,  that  is,  we  can 
replace  (3.6)  with  the  simpler  system 

wt  +  G(x,V+w)  -  G(x,V)  -  0  .  (3.11) 

Upon  dotting  both  sides  of  (3.11)  with  w  and  using  the  stability  condition  (3.7)  we  find  that 
04||w!|2)t<-m||w||2, 

and  so  lim  ||w||  *0  in  A  provided  ||t||  is  sufficiently  small.  Thus  with  the  aid  of  the 
t*4® 

entropy  conditions  (3.5)  and  the  dissipativity  condition  (3.7)  we  have  outlined  an  argument 
suggesting  the  asymptotic  stability  of  the  steady  shock  solution  V. 

Let  us  return  finally  to  the  duct  problem  (3.1)  and  see  how  this  general  analysis 
applies  to  the  gasdynamic  equations.  For  the  sake  of  simplicity  we  consider  isentropic 
flow,  for  which  the  eigenvalues  of  the  corresponding  Jacobian  matrix  df  are  X]:=u-s  and 
X2:-u+s.  Here  u  is  the  velocity  and  s  is  the  sound  speed;  cf.  [7,  8].  Thus  the  isentropic 
system  (3.1)  is  strictly  hyperbolic,  and  by  virtue  of  the  entropy  condition  (3.4),  we  see  that 

X1(VL)>0>X1  (VR),  0<X2  (VR), 

that  is,  the  steady  shock  wave  whose  stability  is  being  testing  is  a  1 -shock  (cf.(3.5)).  It 
only  remains  to  investigate  under  what  conditions  the  term  c(x)g(v)  is  locally  dissipative  in 
the  sense  of  (3.7).  To  this  end,  note  that  g(v)  ■  (pu,  pu2)  may  be  written  as  u(p,pu)  = 
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u(vi,v2),  and  so  c(x)[g(V+w)  -  g(V)]  ■  c(x)U(wi,W2)  satisfies  (3.7)  provided  c>0  in  a 
neighborhood  of  the  viscous  layer,  in  agreement  with  the  inviscid  results  of  Embid  et  al. 
and  Liu. 
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EXTENSIONS  OF  SARKOVSKH’S  THEOREM 
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Introduction 

It  is  known  that  a  four-periodic  orbit  may  imply  a  three-periodic  orbit  and  hence 
an  n-periodic  orbit  for  every  n  =  1,2,...  [1,  Theorem  2;  2,  Theorem  3].  Such  an  orbit 
has,  of  course,  not  the  same  structure  (or,  as  we  say,  is  not  of  the  same  ”type”)  as  the 
four-periodic  orbit  that  appears  in  the  Sarkovskii  ordering.  To  clarify  the  nature  of 
such  and  similar  implications  that  are  not  accounted  for  in  Sarkovskii’s  theorem,  we 
introduced  the  notions  of  loop  and  infinite  loop.  Our  investigations  in  this  direction, 
begun  under  the  US  Army  Summer  Faculty  Research  and  Engineering  Program  1983 
and  continued  in  the  1984  and  1985  Program,  showed  that  there  is  (i)  an  extension  to 
the  left  in  the  Sarkovskii  ordering  and  (ii)  that  there  arc  infinitely  many  additional  links 
within  the  ordering.  The  principal  result,  Theorem  (SR),  that  summarizes  this  investi¬ 
gation,  is  stated  in  Section  II.  For  this  presentation  we  have  cingled  out  some  results 
from  our  more  recent  work  associated  with  elementary  orbits,  i.e.,  orbits  of  a  certain 
type  we  christened  "elementary.”  A  sequence  of  theorems  delineating  the  properties  of 
elementary  orbits  culminates  in  the  complete  ordering  of  these  orbits.  The  ordering  of 
N  (N:  =  the  set  of  natural  numbers)  for  the  elementary  orbits  is  different  from  the  Sar¬ 
kovskii  ordering.  Taking  a  few  further  steps,  we  arrive  at  Theorem  (SRII),  from  which, 
as  a  corollary,  Sarkovskii’s  theorem  follows  in  a  most  natural  way. 
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I.  Definitions  and  Notation 


Let  f:  R  -►  R  be  continuous  and  x0eR.  The  orbit  of  x0  under  f  is  defined  as  the  set 
{x:  x  =  f"(x0),  n  =  0,1,...},  where,  for  every  positive  integer  n,  fn  is  the  n-th  iterate  of  f, 
f1  =  f,  and  f°(x0)  =  x0.  We  shall  write  xn:  =  P(x0)  for  a  given  x0cR  and  call  x,  x2,  ■  •  • 
the  successors  of  x0.  A  pre-orbit  of  a  given  x0cR  is  any  (finite  or  infinite)  sequence 
x0,  X.,,  x_2)...  such  that  f(x_n)  =  for  all  n  for  which  x_n  is  defined.  The  points 

X-„  x_2,  •  in  any  such  sequence  are  called  predecessors  of  x0.  A  point  c0  is  called  crit¬ 
ical  if  f(c0)  =  c0,  i.e.,  a  critical  point  of  f  is  a  fixed  point  of  f.  A  periodic  point  x0  of 
period  p  >  1  (p  a  positive  integer)  is  a  point  for  which  the  relations 

fp(xo)  =  x0,  fk(x0>  jtz  x0,  1<  k  <  p,  hold.  If  x0  is  a  periodic  point  of  period  p,  its  orbit 
is  denoted  by  (x0  Xj,  .  .  .  ,  xp_,).  We  shall  denote  the  kth  iterate  of  x0  under  the  func¬ 
tion  r  by  xkm,  k  =  0,1,....  Thus  xkm:  ==  (F^x,,)  =  xmk  and,  in  particular, 
x0m  =  xk°  =  x0  for  all  nonnegative  integers  k  and  m. 

Definition,  Let  f:  R  -»  R  be  continuous  and  x0fR.  f  has  a  loop  of  order  n  if  x0  has  a 
pre-orbit  (x0,  x_|,...,x_n)  such  that  either 

x0  ^  X_n  <  <  ■  ■  •  <  x_2  <  X_| 

or 

x0  >  x_n  >  x_(n_1)  >  •  •  •  >  x_2  >  x_,  . 

f  has  an  infinite  loop  if  x0  has  an  infinite  pre-orbit  (x0,  x_,,...,x_n,  •  •  •  )  such  that  either 

x0  <  ^  x-n  ^  x-(n-l)  ^  X_2  <  X_j 

or 

X0  >  ^  X_n  >  X_jn_|j  >  •  •  ■  >  X_2  >  X_J 

A  loop  of  order  (n-1)  is  called  an  n-periodic  loop  if  x0  =  x_n. 

Definition.  A  periodic  orbit  (x0,  Xj,  •  •  •  ,xn_.)  of  period  n  is  called  elementary  if 

x2„  <  ’  ’  ■  <  x4  <  x2  <  x0  <  x,  <  x3  <  •  •  •  <  x2k_, 
or 
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x2„  >  •  •  •  >  x4  >  x2  >  x0  >  X|  >  x3  >  •  •  >  x2k  ,, 

where  u+l  =  k  =  —  when  n  is  even  and  v  =  -^—5-  =  k  when  n  is  odd. 

2  2 

An  infinite  pre-orbit  (  x0,  x_|,  x_2,  •  *  •  )  is  called  elementary  if  the  inequalities 
X_2  <  X_4  <  ■  <  X0  <  •  •  •  <  X_3  <  X., 

or 

X_2  >  X_4  >  •  '  >  X0  >  •  •  '  >  X  3  >  X., 

hold. 

We  adopt  the  following  concise  notation:  we  say  property  P(k)  holds  if  f  has  a 
periodic  orbit  of  period  k.  Thus  P(l),  L(k),  E{k),  L(oo),  E  (oo)  moan  that  f  has  a  critical 
point,  a  periodic  loop  of  period  k,  an  elementary  orbit  of  period  k,  an  infinite  loop,  an 
infinite  elementary  pre-orbit,  respectively.  Similarly  Pn(k),  Ln(k),  En(k),  Ln(oo),  En(oo) 
shall  mean  that  fn  has  a  k-periodic  orbit,  k-periodic  loop,  k-periodic  elementary  orbit,  an 
infinite  loop,  an  infinite  elementary  pre-orbit,  respectively.  The  implication  ”A  implies 
B”  is  denoted  by  A  -»  B,  and  ”A  iff  B”  is  denoted  by  A  B. 

II .  ?arkovskii’s.Jhgoifm,anl.Theorem  (SR) 

Using  the  notation  introduced  in  Section  I,  Sarkovskii’s  theorem  and  our  refinement 
read  as  follows. 

Theorem  fSarkovskiii.  Let  f:  R  — ►  R  be  continuous.  Then 

P(3)  -  P(5)  -  P(7)  -  ■  •  - 

P(2-3)  —  P(2-5)  ->  P(2  7)  —  •  •  -» 

P(22-3)  —  P(22  5)  -  P(22-7)  -  •  •  •  - 

•  •  •  — f 

P(23)  -  P(22)  —  P(2)  -*  P(  1). 
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Theorem  fSRh  Let  f:  R  -»  R  be  continuous.  Then 


L(oo)  •  •  •  —  L(5)  —  L(4)  -  L(3) 

F(3)  -  P(5)  -  P(7)  -  ■■■  - 
L2(oo)  •  •  •  -  L2(5)  -  L2(4)  -  L2(3)  ~ 

P(2*3)  —  P(2-5)  —  P(2-7)  —  •  •  •  — 

L22(oo)  •  •  •  -  L22(5)  —  L22(4)  ->  L22(3)  ^ 

P(22-3)  -  P(22  5)  —  P(22,7)  -  •  •  •  - 

...  -  P(23)  -  P(22)  —  P(2)  -  P(l). 

For  the  proof  of  Theorem  (SR)  the  reader  is  referred  to  [3]. 

HI.  The  Hierarchy  of  Elementary  Orbits 

Theorem  1.  If  f  has  an  elementary  orbit  of  period  (2n-f  1),  then  it  has  two  distinct  ele¬ 
mentary  orbits  of  period  (2n+3),  i.e.,  E(2n-f  1)  — ►  E(2n+3),  n=l,2,  •  •  •  . 

Theorem  2.  E(2n+1)  -»  E(oo),  n=l,2,  •  •  •  . 

Theorem  3.  E(oo)  — »  E(2m),  m=l,2,  •  •  •  . 

Theorem  4.  E(2m+2)  — >  E(2m),  m=l,2,  •  •  •  .  There  exist  two  distinct  elemen^ry 
orbits  of  period  2m  if  m=3,4, 

Combining  these  four  theorems,  we  obtain  the  complete  ordering  of  elementary 
orbits. 

Theorem  5  (The  Complete  Ordering  of  Elementary  Orbits). 

E(3)  —  E(5)  —  E(7)  -  •  •  •  -  E(oo)  -  •  •  •  -»  E(8)  -»  E(6)  -  E(4)  -  E(2)  -  E(l) . 

Consider  now  the  complete  ordering  of  elementary  orbits  for  f,  f2,  •  •  •  ,f2n,  f2n+1, 

Then  the  implications  E(6)  — »  E2(3)  and  E2(2)  — *  E(2)  provide  the  linkage  shown  in  the 
following  important  result. 
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Theorem  6.  Let  f:  R  — ►  R  be  continuous.  Then 


That  the  Sarkovskii  ordering  is  contained  in  the  ordering  established  in  Theorem  6  fol¬ 
lows  from  the  implications  P(2n+1)  «-»  E(2n+1),  E(2)  «-»  P(2),  and  P2(n)  *-+  P(2n), 
which,  in  turn,  ensure  that 

E2n(2m+1)  -  P((2m+l)2n) 
and 

E2°(2)  -  P(2n+I) . 

In  terms  of  properties  P(k)  and  E(k)  we  have,  therefore,  the  following  refinement  of 
Sarkovskii’s  theorem. 

Theorem  ISRIIl.  Let  f:  R  -»  R  be  continuous.  Then 
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Abstract 


It  is  known  that  the  model  for  a  single  unforced  fluid  film  journal  bearing  shows  Hopf  bifurcation 
to  both  stable  and  unstable  limit  cycles.  In  certain  parameter  ranges,  both  limit  cycles  exist  at  the 
same  time.  When  a  rotating  unbalance  is  introduced  into  the  system,  it  is  expected  that  some  period 
doubling  bifurcations  will  occur  as  the  amplitude  of  the  forcing  is  increased.  The  Poincare  map  generated 
by  simulation  of  the  equations  of  motion  for  the  system  is  used  to  show  these  bifurcations  for  a  particular 
bearing  system.  By  varying  system  parameters,  the  Poincare  map  can  be  used  to  build  up  a  catalogue 
of  possible  behaviors  for  the  journal  bearing  system.  At  present,  the  range  of  possible  behaviors  is 
unknown.  With  the  introduction  of  periodic  forcing,  many  nonlinear  systems  are  known  to  show  chaotic 
behavior  in  certain  regions  of  their  parameter  space.  The  Poincare  map  shows  that  apparently  chaotic 
behavior  is  possible  for  the  journal  bearing  system  for  certain  sets  of  parameter  values. 


Journal  Bearing  Equations 


Fig.  1  shows  a  rotor  of  diameter  D  and  weight  W  spinning  with  angular  velocity  tl  in  a  journal 
bearing  of  length  L.  The  position  of  the  rotor  is  given  in  polar  coordinates  by  E  and  «t>  and  in  cartesian 
coordinates  by  X  and  Y .  The  clearance  C  is  the  difference  between  the  radius  of  the  journal  and  the 
radius  of  the  rotor.  The  space  between  the  rotor  and  the  journal  is  filled  with  a  fluid.  As  the  rotor  spins, 
pressure  forces  which  support  the  rotor  are  generated.  By  solving  Reynolds  equation  for  this  journal 
bearing  system,  the  radial  and  tangential  forces  on  the  rotor,  FE  and  F®,  can  be  found.  These  forces 
can  also  be  resolved  into  components  Fx  and  Fr  in  cartesian  coordinates.  The  equations  of  motion  are 
most  often  nondimensionalized  to  reduce  the  number  of  parameters,  [1],  [21  [3],  [4],  [S],  [6]. 

Using  the  finite  model  proposed  by  [2],  the  nondimensional  equations  of  motion  become 


where 


fr  =  4  [afftle/j02  +  fl(g,  Mi  -  Ij/j11] 

/,  =  -4  [a(g*)eJl'  +<giMi  -  i-V320] 


rim  _  /*,+* 

•  L  a 


sin  9  cos"  9 


+£Cos  9)* 


dd. 


9\  =  tan-1  ( - -) 

'£(©-20)'' 


a(x) 


3(x  —  tanhx) 
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and 


Si 

gi 

Sj 


rtT  r, + _j _ i_i 

e2  Lf)J  [_  Vl  —  £l  V4  -£*J 

nV+e1)  fZ/]2  r  1 _ 1  ~1 

2  LdJ  [_V1  -e1  V4  -f2J 
2  r L"|2  r_  ,  2 (K  -y,)  8(*  -y2f| 
c*  LdJ  L  vi^f  V^fJ 

2  TiT  \*z*  _  JLzZL] 

LdJ  Lvi  —  £2  V4  -e2J 


Y\  =  tan  1 
y2  =  tan-1 


Vl  -£2 

£ 

V4  —  £Z 
e 


The  Jl*  are  often  referred  to  as  the  bearing  integrals. 

The  nondimensional  forces  are  expressed  in  cartesian  form  by  using  relations 


/*  =  -{fr(e.4,e,i)cos$  +f,(e,t,e,ifi)sin4>) 
fi  =  ~(fr(e,<>,e.i) sin0  -f ,(e,4>,e,i)cos<p) 

Thus,  the  nondimensional  equations  of  motion  in  state  variable  form  are 


* 


d_ 

dt 


V 

i 

y 

= 

t 

0)y/(m/a)  *  to2 

J. 

=  f(x) 


The  parameters  to  be  used  are  the  nondimensional  running  speed  <o,  the  bearing  parameter  m/c  and 
the  finite  length  parameter  L/D.  The  parameter  m/a  describes  most  of  the  geometry  of  a  particular 
bearing,  while  the  parameter  L/D  completes  the  description. 


Hopf  Bifurcations  of  Journal  Bearings 


As  the  parameter  <o  is  varied,  a  Hopf  bifurcation  occurs,  [1],  [3],  [5J,  [6],  Fig.  2  shows  the  resulting 
stability  boundary  and  bifurcation  diagrams  for  the  case  L/D  =  0.  For  m/a  >  1.6,  stable  supercritical 
limit  cycles  are  predicted,  while  unstable  subcritical  limit  cycles  are  predicted  for  mja  <  1.6.  Near  the 
transition  from  stable  to  unstable  limit  cycles,  two  limit  cycles  are  expected  to  exist  at  the  same  time, 
one  being  stable,  the  other  unstable,  [7].  Without  resorting  to  the  calculation  of  higher  order  terms 
for  the  bifurcation  coefficients,  conventional  Hopf  bifurcation  theory  gives  no  clue  as  to  which  of  two 
possible  bifurcation  diagrams  occurs,  Fig.  3.  Simulation  of  the  equations  of  motion  for  certain  parameter 
values  shows  that  the  stable  limit  cycle  surrounds  the  unstable  one,  case  (b)  in  Fig.  3.  The  Poincarg 
map  is  used  to  examine  this  region  in  more  detail.  The  bifurcation  diagram  can  be  built  up,  and  the 
effects  of  speed  changes  can  be  seen  more  easily.  The  four  dimensional  system  of  equations  reduces  to 
a  three  dimensional  Poincart  map,  an  example  of  which  is  shown  in  Fig.  4.  TheX  and  Y  coordinates 
correspond  to  the  x  and  y  coordinates  of  the  equations  while  the  Z  coordinate  corresponds  to  i  in  the 
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Figure  3.  Possible  bifurcation  diagrams. 
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equations.  The  actual  map  used  is  to  sample  whenever  y  =  0  with  y  increasing.  The  three  planar  views 
X -Y ,  X-Z,  and  Y-Z  are  projections  to  two  dimensions  of  the  3-dimensional  view  shown  in  the  lower 
right.  This  X-Y-Z  view  is  a  picture  of  the  complete  Poincard  map  with  coordinates  x,  y,  and  x  for  the 
bearing  system. 

It  can  be  seen  that  there  are  three  fixed  points,  one  corresponding  to  the  stable  fixed  point,  one 
corresponding  to  the  unstable  limit  cycle,  and  the  other  to  the  stable  limit  cycle.  The  top  fixed  point 
is  the  stable  fixed  point  of  the  system  and  the  lower  one  is  the  fixed  point  associated  with  the  stable 
limit  cycle.  The  middle  unstable  fixed  point  is  the  unstable  limit  cycle.  Different  point  types  represent 
solutions  starting  from  different  initial  conditions,  which  are  not  shown.  All  the  fixed  points  found  and 
shown  have  periods  within  1%  of  An.  By  examining  the  maps  'is  speed  is  varied,  it  turns  out  that  most 
of  the  behavior  lies  on  a  1-dimensional  curve,  which  means  that  two  of  the  eigenvalues  of  the  map  have 
very  small  magnitude.  Neglecting  these  eigenvalues,  looking  only  at  the  behavior  along  the  curve,  the 
effect  of  speed  on  behavior  can  be  easily  seen.  Initially,  there  is  only  one  fixed  point,  the  stable  fixed 
point  of  the  original  system.  As  speed  increases,  there  appears  a  pair  of  fixed  points,  one  stable,  one 
unstable,  which  appear  to  diverge  from  a  single  point.  They  represent  the  appearance  of  the  stable 
and  unstable  limit  cycles  below  the  critical  speed.  This  pair  of  fixed  points  moves  further  apart  until 
the  unstable  point  meets  the  original  stable  fixed  point  at  the  critical  speed.  This  corresponds  to  the 
unstable  limit  cycle  shrinking  down  to  the  equilibrium  point  at  the  critical  speed.  Further  increases  in 
speed  turn  this  into  the  unstable  fixed  point.  The  large  amplitude  limit  cycle  persists  as  a  stable  fixed 
point  moving  closer  to  the  clearance  boundary. 

It  is  now  possible  to  generate  a  more  complete  bifurcation  diagram,  Fig.  S,  by  using  the  fixed  point 
information  from  the  Poincari  maps.  The  boundary  of  clearance  is  ±  1,  and  it  is  unstable.  The  force 
becomes  infinite  at  the  boundary  and  so,  theoretically,  collisions  with  the  boundary  are  not  possible.  No 
account  is  taken  of  oil  film  breakdown  or  surface  deformations. 

The  program  used  to  generate  the  Poincart  maps  is  an  interactive  set  of  command  level  programs 
and  Fortran  progrs  ms  allowing  the  user  to  choose  initial  conditions  on  the  picture,  to  interactively  change 
certain  attributes  of  the  individual  curves  and  to  easily  change  system  parameters.  It  does  suffer  from 
the  same  drawbacks  as  simulation.  A  large  number  of  initial  conditions  need  to  be  tried  to  be  certain 
of  capturing  all  the  relevant  behavior  and  the  simulations  tend  to  be  computationally  slow,  lessening 
the  interactive  nature  of  the  program.  The  interactive  nature  of  the  program  does  allow  more  rapid 
investigation  of  interesting  areas  than  some  discretization  of  intial  condition  space,  which  would  use  a 
significant  amount  of  time  producing  essentially  equivalent  sets  of  points. 

There  are  other  methods  for  generating  more  complete  bifurcation  diagrams  for  autonomous  sys¬ 
tems.  Continuation  methods  such  as  those  described  by  [8]  and  [9]  can  also  be  used  to  trace  out 
branches  of  bifurcating  steady  state  solutions.  In  fact,  the  method  described  by  [9]  can  be  used  to 
trace  out  branches  of  periodic  solutions  of  limit  cycles  arising  from  Hopf  bifurcation.  These  methods  do 
require  having  some  initial  solution  such  as  a  fixed  point  or  limit  cycle  from  which  the  method  begins. 
Newton’s  method  and  some  minimization  techniques  may  also  be  used  to  find  limit  cycles  but  their 
implementation  is  sometimes  difficult. 


Unbalanced  Journal  Bearing  Equations 


The  Poincard  map  does  allow  consideration  of  periodically  forced  systems.  For  the  journal  bearing, 
this  is  particularly  useful  since  most  bearing  systems  do  suffer  from  some  form  of  rotating  unbalance. 
The  nondimensional  equations  of  motion  for  a  journal  bearing  system  with  an  unbalance  rotating  at  the 
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same  frequency  as  the  rotor  can  easily  be  shown  to  be 


1  ,  ,  1 

- =====  /.  +  -r  +  a  cos  t 

d)-\/(m/a) 

- J=  /y  +  a  sin  t 

<o-\/(m/o) 

where  a  is  a  further  non  dimensional  parameter  representing  the  magnitude  of  the  unbalance. 

Effect  of  Rotating  Unbalance 

For  this  periodically  forced  system,  the  most  natural  Poincarg  map  is  to  sample  at  period  In,  the 
period  of  forcing.  The  journal  bearing  system  transformed  to  an  autonomous  system  of  eqjations  is 
really  5-dimensional  (x.y.i.y,  /).  The  resulting  map  is  a  4-dimensional  map  from  phase  space  to  phase 
space.  To  see  the  behavior  requires  further  projections  down  to  2  or  3  dimensions. 

The  unforced  system  with  both  large  stable  and  unstable  limit  cycles  existing  below  the  critical 
speed.  Fig.  4,  will  now  have  a  rotating  unbalance  introduced.  For  small  amplitudes  of  unbalance,  it  is 
seen  that  the  behavior  of  the  system  is  still  close  to  that  of  the  unforced  case,  except  that  the  stable  fixed 
point  at  the  top  now  corresponds  to  a  small  amplitude  orbit  of  period  2 n.  The  other  two  fixed  points 
have  period  An.  As  the  amplitude  of  forcing  is  increased,  the  top  two  fixed  points  tend  towards  each 
other.  The  2/r-periodic  stable  point  is  approaching  the  An -periodic  unstable  point  or  the  2^-periodic 
stable  orbit  is  approaching  the  the  An -periodic  unstable  orbit.  Fig.  6.  By  a  =  0.016,  Fig.  7,  the  two  points 
have  coalesced  into  a  single  unstable  saddle  point  of  period  2 n.  As  the  forcing  is  increased  further,  the 
^-periodic  unstable  saddle  point  moves  closer  to  the  4^-periodic  stable  fixed  point,  or,  in  the  original 
configuration  space,  the  unstable  orbit  is  approaching  the  large  stable  orbit,  Fig.  6.  By  a  -  0.17,  the 
orbit  has  changed  from  one  large  orbit  with  no  loops  to  one  with  a  loop  in  it.  Fig.  6.  Further  increasing 
the  magnitude  of  the  unbalance,  the  two  fixed  points  of  the  stable  orbit  have  coalesced  with  the  saddle 
of  the  unstable  orbit  to  form  a  single  stable  fixed  point  of  period  2 n  with  two  negative  real  eigenvalues. 
On  the  (x,y)  plane,  the  inner  loop  of  the  stable  4w-periodic  orbit  appears  to  have  coalesced  with  the 
outer  loop  to  form  a  single  7n -periodic  orbit.  The  Poincari  maps  for  this  system  all  seem  to  lie  almost 
on  a  plane,  indicating  that  one  of  the  eigenvalues  of  the  map  remains  very  small.  If  Fig.  6  is  looked  at 
in  reverse  order,  two  period  doubling  bifurcations  can  be  seen  to  occur.  The  stable  orbit  of  period  In 
bifurcates  to  a  stable  orbit  of  period  An  and  an  unstable  orbit  of  period  In.  Then,  the  unstable  orbit  of 
period  2 n  bifurcates  to  an  unstable  orbit  of  period  An  and  a  stable  orbit  of  period  2 n. 

Conclusions 

By  generating  such  Poincar6  map  pictures  over  the  parameter  space  of  the  bearing  system,  a  cata¬ 
logue  of  all  the  possible  behaviors  could  be  built  up.  The  boundaries  separating  topologically  different 
behavior  could  be  easily  seen.  It  would  be  an  extremely  tedious  task  running  enough  cases  over  the 
entire  four  parameter  space  to  capture  all  the  possible  behavior  types  which  are  expected  to  range  from 
very  simple  to  very  complex.  At  present,  it  is  unknown  just  what  are  all  the  possible  types  of  behavior. 
Fig.  8  shows  some  complicated  behavior  for  the  journal  bearing  system.  The  map  appears  to  have  no 
periodic  points  but  there  does  appear  to  be  some  structure.  All  the  points  tend  to  lie  along  some  form 
of  curve,  suggesting  at  least  two  dominant  frequencies  in  the  response.  The  trajectory  in  the  (x,y) 
plane  for  t  from  400  to  800  is  shown  in  Fig.  9.  Most  of  the  time,  the  trajectory  is  almost  circular,  but 
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an  extra  small  loop  seems  to  occasionally  appear,  grow  and  disappear  again.  The  frequency  content 
of  the  trajectory  in  this  time  period  is  shown  in  Fig.  10.  The  tra|ectory^is  mostly  composed  of  three 

main  frequencies  (the  forcing  frequency  1,  and  the  subharmonics  -  and  -)  and  some  smaller  amounts 
of  frequencies  between  these.  This  behavior  seems  to  exist  only  for  a  narrow  range  of  parameter  values. 
Small  changes  in  a  can  cause  only  periodic  motion  to  appear. 

Once  the  fixed  points  of  the  Poincard  map  are  known,  it  is  possible  to  define  their  various  regions 
of  attraction.  For  a  two  dimensional  phase  space,  it  is  a  straightforward  task  to  check  where  each  inital 
condition  point  goes  and  mark  it  accordingly.  In  3  dimensions  or  higher,  the  visualization  is  much  more 
difficult  since  the  regions  of  attraction  are  volumes  and  the  number  of  initial  condition  points  to  be 
checked  increases  dramatically.  For  the  case  of  nonperiodic  behavior  such  as  in  Fig.  8,  it  is  not  so  clear 
how  to  interpret  where  the  trajectory  is  going.  There  may  even  exist  periodic  fixed  points  for  some 
trajectories. 

The  Poincard  map  may  be  useful  in  examining  the  structure  of  the  nonperiodic  behavior  of  journal 
bearing  systems.  It  may  also  suggest  a  type  of  underlying  simpler  model  which  has  similar  behavior  to 
the  original  system.  For  example,  it  may  be  possible  to  find  a  lower  dimension  model  for  the  bearing 
system  which  captures  the  same  nonlinear  behavior.  If  the  Poincard  map  is  planar,  the  minimum 
dimension  of  the  system  which  could  produce  it  is  three.  For  some  cases,  it  might  be  possible  to  perform 
coordinate  transformations  to  produce  an  approximating  three-dimensional  system.  The  planar  nature 
of  the  Poincard  map  is  also  suggestive  of  a  large  parameter  in  the  original  equations.  This  parameter 
may  give  ways  of  simplifying  the  equations  to  lower  order.  Also,  the  Poincard  map  may  be  used  to 
compare  various  models  to  ensure  they  do  all  show  the  same  nonlinear  responses. 
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ABSTRACT 

Spin-stabilized  projectiles  with  liquid  payloads  can  experience  a  severe  flight  instability 
characterized  by  a  rapid  yaw-angle  growth  and  a  simultaneous  loss  in  spin  rate.  Labora¬ 
tory  experiments  and  field  tests  have  shown  that  this  instability  originates  from  the 
internal  fluid  motion  in  the  range  of  small  Reynolds  numbers.  In  earlier  work,  we 
developed  a  simple  model  of  this  flow  based  on  linearized  equations  for  the  deviation 
from  solid-body  rotation  in  an  infinite  cylinder.  Here,  we  perform  a  perturbation 
analysis  in  order  to  estimate  the  effect  of  nonlinear  terms.  Beyond  a  small  correction  of 
the  axial  velocity  component,  we  obtain  radial  and  azimuthal  components  of  the  velocity 
field  in  agreement  with  computational  results  for  the  core  region  of  a  finite-length 
cylinder.  The  analytical  results  are  exploited  in  the  design  of  a  spectral  Navier-Stokes 
solver  for  the  steady  motion  in  a  finite  cylinder.  A  first  raw  version  of  this  spectral  code 
provides  flow  field  and  pressure  distribution  in  a  small  fraction  of  the  computer  time 
required  by  existing  codes.  We  report  some  results  and  discuss  possible  refinements  of 
this  code. 

1.  Introduction 

It  is  well-known  that  spin-stabilized  shells  carrying  liquid  payloads  can  suffer  a 
dynamical  instability  which  results  in  an  increased  coning  (or  yaw)  angle  and  a  simul¬ 
taneous  loss  in  spin  rate.  Laboratory  experiments,  computational  results,  and  field  tests 
indicate  that  these  phenomena  arise  from  the  coning-induced  fluid  motion  in  a  limited 
range  of  small  Reynolds  numbers.  Although  in  special  cases  this  instability  has  been 
removed  by  trial  and  error,  future  design  of  reliable  projectiles  would  profit  from  the 
opportunity  to  estimate  the  liquid  moments,  and  to  include  these  moments  in  flight 
simulators.  The  empirical  data  base  [l,  2]  is  sparse,  however,  and  computational 
methods  in  use  [3,  4,  5]  are  rather  demanding. 

Our  theoretical  analysis  of  this  problem  serves  on  one  hand  to  gain  insight  into  the 
anatomy  of  the  flow  phenomena  and  to  support  the  ongoing  experiments.  On  the  other 
hand,  it  promotes  our  efforts  to  develop  a  more  efficient  code  for  the  numerical  simula¬ 
tion  of  the  flow  in  a  finite  container.  While  the  analytical  work  aims  at  the  velocity  field 
in  the  core  region  of  a  sufficiently  long  cylinder  and  on  the  viscous  components  of  the 
moments,  in  particular  the  viscous  despin  (negative  roll)  moment,  the  computational 
work  also  captures  the  flow  near  the  end  walls  and  the  pressure  contributions  to  yaw 
and  pitch  moments. 

Our  previous  work  (6|  shows  that  the  deviation  from  solid  body  rotation  is 
governed  by  a  small  parameter  e  =  Osin0/a>  involving  the  nutation  rate  n  ,  the  nuta¬ 
tion  angle  6,  and  the  spin  rate  u.  The  solution  of  the  linearized  equations  consists  of 
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only  an  axial  component  of  orri^r  0(e).  This  axial  flow  is  the  dominating  feature  of  the 
fluid  motion  and  produces  a  negative  roll  moment  of  order  0(e2)  owing  to  Coriolis 
forces.  Although  these  results  are  in  reasonable  agreement  with  experimental  and  com¬ 
putational  data,  one  may  anticipate  modifications  of  velocity  field  and  roll  moment  if 
nonlinear  terms  are  taken  into  account.  Estimates  of  these  nonlinear  effects  are  desired 
in  order  to  support  previous  results  and  to  verify  our  conclusion  that  the  three- 
dimensional  flow  field  in  a  finite-length  cylinder  is  essentially  given  by  the  solution  of 
linearized  momentum  equations.  In  the  following,  we  perform  a  straightforward  pertur¬ 
bation  expansion  for  the  nonlinear  problem.  We  develop  and  solve  the  equations  for  the 
flow  in  an  infinitely  long  cylinder  up  to  order  0(<3).  A  closed-form  solution  is  given  for 
the  radial  and  azimuthal  velocity  components  at  second  order.  The  third-order  equa¬ 
tions  are  solved  numerically. 

The  perturbation  solution  also  provides  estimates  for  the  number  of  expansion 
functions  required  for  accurate  spectral  representation  of  the  radial  (r)  and  azimuthal 
( <f> )  structure  of  the  solution.  A  spectral  code  appears  as  an  attractive  alternative  to  the 
existing  Navier-Stokes  solvers.  The  finite-difference  code  developed  at  Sandia  Labora¬ 
tories  [3,  4]  exploits  Chorin’s  method  of  artificial  compressibility.  The  steady  solution  at 
11  X24  X21  grid  points  in  r ,  4>,  * -direction  is  obtained  by  integrating  over  typically 
10*  time  steps,  a  task  that  requires  68  minutes  of  CPU  time  on  an  IBM  3090.  The  result 
consists  of  22,000  plus  values  for  the  velocity  components  vr,  v+,  v„  and  the  pressure  p 
that  can  be  utilized  for  a  calculation  of  the  moments.  Strikwerda  &  Nagel  [5]  describe  a 
code  using  finite  differences  in  radial  and  axial  direction  and  pseudospectral  differencing 
in  the  azimuthal  direction.  Nonuniform  grids  are  introduced  for  increased  resolution 
near  the  walls.  The  difference  equations  are  solved  by  an  iterative  method  based  on  suc¬ 
cessive  over- relaxation.  The  computer  time  required  is  comparable  to  that  of  the  Sandia 
code  (Nusca,  BRL,  personal  communication).  Although  the  relative  merits  of  the  two 
codes,  especially  with  respect  to  the  captured  range  of  Reynolds  numbers  are  yet  in  the 
dark,  it  seems  well  possible  to  beat  both  of  these  codes  in  two  respects:  computer  time 
and  adaptability  to  the  unsteady  problem. 

For  a  feasibility  study,  we  have  pursued  a  simple  concept  that  is  open  to  numerous 
refinements.  We  use  Chcbyshcv-Fouricr-Chebyshev  expansions  in  r,4>,z,  respectively, 
and  convert  the  linearized  equations  into  a  linear  algebraic  system  for  the  expansion 
coefficients.  The  solution  of  this  system  (or  any  other  solution  at  neighboring  parame¬ 
ters)  is  used  as  initial  approximation  for  iterative  improvement  by  the  modified  Newton 
method.  The  experience  with  this  code  is  encouraging  with  respect  to  accuracy, 
efficiency,  and  robustness. 

2.  Governing  Equations 

We  consider  the  motion  of  a  fluid  of  density  p  and  viscosity  p  in  a  cylinder  of 
radius  a  and  length  2c  that  rotates  with  the  spin  rate  u  about  its  axis  of  symmetry,  the 
z-axis.  We  consider  the  motion  with  respect  to  the  nutating  coordinate  system  x  ,y ,  z  . 
This  system  is  obtained  from  the  inertial  system  X ,  Y ,  Z  by  a  rotation  with  the  nuta¬ 
tion  angle  6  about  the  axis  Y  =  y .  Therefore,  x  is  in  the  Z ,  z -plane,  and  this  plane 
rotates  about  the  Z-axis  with  the  nutation  rate  Q  .  The  two  axes  of  rotation  intersect 
in  the  center  of  mass  of  the  cylinder.  We  consider  w  >  0,  0  ,  and  0  <  0  <  tt/2  as  con¬ 
stant. 
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The  fluid  motion  is  governed  by  the  Navier-Stokes  equations  written  in  the  nutat¬ 
ing  coordinate  system: 

DV 

e[-£P  +  2n  x  v„ +n  x  (nxr)]  = -v/>„+/J72v„  .  (u) 

V-V.  =0.  (lb) 

V„  is  the  velocity  measured  in  the  nutating  frame,  Pn  the  pressure,  and  r  the  position 
vector.  Equations  (1)  are  subject  to  the  no-slip  and  no- penetration  conditions  at  the 
cylinder  walls. 

It  is  convenient  [6]  to  split  the  velocity  and  pressure  fields  according  to 

Vn  =  V,  +  ,  Pn  =  P,  +  Pd  ,  (2) 


where  V, ,  Pt  describe  the  state  of  pure  solid-body  rotation,  whereas  Vd,  Pd  represent 
the  deviation  from  solid-body  rotation.  The  deviation  Vrf  and  the  reduced  pressure  Pd 
are  ultimately  responsible  for  the  observed  flight  instability. 


The  equations  for  V d ,  Pd  are  written  in  terms  of  nondimensional  quantities  \d  ,  pd 
using  a,u>,  and  p  for  scaling  length,  time,  and  mass,  respectively.  The  solution  then 
depends  on  four  nondimensional  parameters:  aspect  ratio  \  —  c  /a ,  nutation  angle  6, 
frequency  r  =  Cl  /u),  and  Reynolds  number  R  =  poja2/p.  The  aspect  ratio  enters  the 
solution  only  through  the  boundary  conditions  at  the  end  walls  of  the  cylinder.  The 
boundary  conditions  on  vd  are  homogeneous. 


In  cylindrical  coordinates  r,<f>,z,  the  equations  for  the  nondimensional  deviation 
velocity  vd  —  (t/r ,  v^,  v, )  and  pressure  pd  take  the  form 


Id ,  ,  ,  1  dv* 

7ir{rv')+  7-aT  + 
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(4) 


Tr  =  -ecos <t>  ,  =  csin^  ,  r,  =  rcoaO  ,  e  =  rsinfl  . 

The  primary  effect  of  nutation  is  contained  in  the  ^-periodic  force  term  -2 rrr  = 
2ercos<f>  in  the  2-momentum  equation  (3d).  For  e  =  0,  equations  (3)  support  the 
trivial  solution  =  0 ,  =  0.  The  system  also  supports  the  following  symmetries: 


vT(r,  <t>+x,-z)=  vr(r  ,  <f>,  z) 

(5a) 

t>*(r,  ++*,-*)  =  v4(r,<f>,z) 

(5b) 

v,{r,<t>+ir,-z)=  -  vg(r  ,  <f>,  z) 

(5c) 

Pi{r,  <(>+*, -z)  =  Pi{r,<t>,z ) 

(5d) 

3.  Perturbation  analysis  for  an  infinite  cylinder 

The  steady  flow  in  a  relatively  long  cylinder  (aspect  ratio  X  >  4)  at  low  Reynolds 
number  is  expected  to  have  a  rather  simple  structure  and  to  provide  a  roll  moment  pro¬ 
portional  to  Re  .  In  fact,  the  flow  is  expected  to  exhibit  little  axial  variation  over  much 
of  the  cylinder  length.  Previous  work  (6]  has  therefore  relaxed  the  boundary  conditions 
at  the  end  walls.  In  this  way,  one  seeks  the  steady  flow  in  a  finite  segment  of  an 
infinitely  long  cylinder. 

In  the  physical  situations  of  interest,  e  =  (fi  /w)  sin0  is  a  small  parameter, 
e  <  0.06.  Consequently,  it  seems  reasonable  to  pursue  a  straightforward  perturbation 
expansion  in  e.  This  provides  in  the  form 

V,  =  g  €"v<")(r  >)  (8) 

n  =1 

and  similar  expressions  for  pd  . 

The  development  of  general  expressions  for  the  expansion  coefficients  from 
equations  (3)  indicates  an  alternating  pattern:  Odd-order  terms  contain  odd  multiples  of 
<f>  and  contribute  only  to  the  axial  velocity  v, ,  while  even-order  terms  contain  even  mul¬ 
tiples  of  the  azimuthal  coordinate  <f>  and  contribute  only  to  the  radial  velocity  vr  and 
azimuthal  velocity  v^.  Therefore, 


(0,  0,  n  odd, 

V  ^  ~ -  /  V  f  \ 

K(n),  Vf  (n),  0),  n  even, 
and  the  components  of  take  the  form 


n/2 

E  Km 

(8a) 

m  =  1 

nj\ 

r)+  ± 

(8b) 

m  =1 


»,'•>  =  '“e %,*.(-■)  ,  (8c) 

m  =1 

where  the  tilde  denotes  the  complex  conjugate.  The  aperiodic  term  in  v/n )  is  suppressed 
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by  the  continuity  equation.  The  r -dependent  coefficient  functions  in  eqs  (8)  are 
required  to  satisfy  homogeneous  boundary  conditions  at  r  =  1  and  to  be  finite  at  the 
axis  r  =  0  for  a  physically  meaningful  solution. 

At  the  lowest  order  0(e),  the  z -independent  force  term  in  eq.  (3d)  can  be  balanced 
only  by  an  axial  component  of  the  deviation  velocity.  This  component  is  the  dominating 
feature  of  the  flow  in  a  long  cylinder.  The  axial  velocity  at  order  0(e)  can  be  found  in 
analytical  form, 

®u(0  =  “  r)  ’  (10) 

m(°) 

where  / j  is  the  modified  Bessel  function,  and  a  =  (1  4-  *  )(R  /  2)1/2.  This  solution  is 
valid  for  arbitrary  Reynolds  number  but  may  be  unstable  as  R  exceeds  some  critical 
value.  The  properties  of  the  resulting  flow  field  are  discussed  by  Herbert  [6]. 

At  higher  order,  it  is  convenient  to  eliminate  the  pressure  for  the  periodic  com¬ 
ponents  by  using  the  vorticity  form  of  eqs.  (3).  At  order  0(c2),  comparison  of  the  equa¬ 
tion  for  Wjo  fche  imaginary  part  of  the  equation  for  ton  immediately  shows  that  the 
aperiodic  component  of  the  azimuthal  velocity  is 


«jo(0=-2lin(«'11(r)).  (11) 

This  relation  can  be  exploited  to  show  that  the  despin  moment  of  order  0(e2)  due  to 
shear  forces  on  the  cylinder  wall  is  identical  with  our  former  result. 

The  ^-periodic  components  are  governed  by  a  coupled  set  of  inhomogeneous 
differential  equations  with  variable  coefficients.  Essential  simplification  at  the  expense  of 
increasing  the  order  of  differentiation  results  from  eliminating  v21  by  use  of  the  con¬ 
tinuity  equation.  With  some  effort,  the  radial  velocity  component  of  0(e2)  can  be  found 
in  closed  form, 


rx  it. 


where  8  =  fir ,  0  =  (i  -  l)/?1/2,  and  Jlt  /2,  andy2  are  Bessel  functions.  The 
coefficients  c  1(  c2,  c3,  and  c4  can  be  determined  numerically. 

The  velocity  components  at  order  0(e3)  are  of  interest  primarily  since  to31  provides 
the  first  nonlinear  correction  to  the  despin  moment.  In  view  of  the  effort  involved  in 
deriving  the  closed  form  solution  for  u2i  an<^  the  ultimate  need  to  determine  the 
coefficients  in  eq.  (12)  numerically,  we  decided  to  solve  the  differential  equations  for  the 
third-order  components  by  means  of  a  spectral  collocation  method. 


4.  Results  of  the  Perturbation  Analysis 

Detailed  equations,  results,  and  graphs  of  the  various  functions  at  relevant  Rey¬ 
nolds  numbers  will  be  published  elsewhere  [7].  Here  we  give  only  a  summary  of  the  main 
results.  The  motion  is  governed  by  the  axial  component  to u  at  order  0(e).  Of  the 
higher  order  terms,  only  the  aperiodic  term  t;^  is  substantial.  In  the  cylinders  center 
section,  these  terms  are  in  good  agreement  with  results  obtained  from  the  Sandia  code, 
and  in  excellent  agreement  with  our  own  computations.  All  the  other  terms  are  not  only 
of  order  0(1)  but  in  fact  less  than  unity,  assuring  rapid  convergence  of  the  perturbation 
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series.  The  contribution  of  w31  to  the  despin  moment  is  negligible.  The  ^-periodic 
terms  oscillate  about  zero  as  r  varies  between  0  <  r  <1.  Accurate  representation  of 
single  high-order  terms  by  radial  Chebyshev  series  may  require  numerous  expansion 
functions.  For  the  total  velocity  field,  however,  the  error  in  representing  these  terms  is 
of  little  importance.  At  Reynolds  numbers  in  the  range  of  maximum  despin  moment, 
reasonably  accurate  approximations  can  be  obtained  with  as  few  as  five  polynomials  in 
radial  direction.  In  the  azimuthal  direction,  the  solution  is  governed  by  terms  periodic  in 
<(>,  and  by  the  aperiodic  term  v%.  Fourier  series  with  three  or  five  modes,  therefore,  pro¬ 
vide  approximations  of  sufficient  accuracy  for  practical  purpose. 

5.  Spectral  Approximations  for  a  Finite-Length  Cylinder 

The  results  of  the  perturbation  analysis  suggest  that  a  good  approximation  to  the 
flow  in  a  finite  cylinder  can  be  obtained  by  solving  the  linearized  version  of  equations  (3). 
Linearization  can  be  performed  in  different  ways.  The  first  is  a  linearization  in  e,  as  in 
the  perturbation  analysis.  The  resulting  equations  support  strong  symmetries.  Beyond 
equations  (5),  the  solution  satisfies 

vj(r  ,  <f>+ir,  z)  =  -v,(r,  <f>,  z)  ,  (13a) 

Pi{r  ,  z)  = -Pi{r  ,<t>,z)  •  (13b) 

These  relations  provide  a  useful  check  on  the  results  of  the  spectral  code.  A  second 
linear  system  can  be  obtained  by  linearization  in  the  components  of  .  This  lineariza¬ 
tion  retains  coupling  terms  such  as  2r^v,  in  eq.  (3b)  which  destroy  the  symmetries  (13). 
The  second  system  can  be  considered  a  special  case  of  a  third  linearization  about  some 
known  solution  vj°),  p/°\  The  third  procedure  is  very  efficient  if  the  solution  is  sought 
for  a  densely  spaced  sequence  of  parameter  combinations  as  in  flight  simulations.  The 
second  system  is  equivalent  with  the  third  one  for  v ^  =  P<(0)  =  0. 

The  algebraic  form  of  the  equations  is  obtained  by  use  of  spectral  collocation.  The 
velocity  components  arc  expressed  in  the  form 

K  L  M 

”r  =  E  E  E  uUm  Rkl(r )  Fi(<f>)  Zm(z  /X)  (14) 

It  =1  1  =1  m  =1 

with  similar  expressions  for  v+,  vg ,  and  p*.  The  azimuthal  functions  are 
Ft  —  cos  [(/  -  1)^/2]  for  odd  /,  Ft  —  sin  [/^/ 2]  for  even  /,  where  /  =  1,  2,  •  •  •  L , 
and  L  is  odd.  The  azimuthal  collocation  points  are  equidistant,  ~  2 ir(l  -  1  )/L .  If 
no  use  of  the  symmetries  (5)  is  made,  the  axial  expansion  functions  are  the  Chebyshev 
polynomials  Zm  —  7'm_1(z/X),  m  =  1,2,  •••  M.  The  collocation  points  are 
zm/\  —  cos  [(m  -  1)t t /(M  -  1)].  In  radial  direction,  even  or  odd  Chebyshev  polynomi¬ 
als  are  used,  depending  on  the  quantity  under  consideration  and  the  periodicity  in  <j>. 
The  proper  choice  is  dictated  by  the  requirement  of  a  unique  value  of  all  quantities  on 
the  axis  r  =  0.  For  example,  the  axial  velocity  component  must  assume  a  unique  value 
independent  of  ^  as  r  — ►  0.  Therefore,  even  polynomials  are  to  be  used  if  /  =  1  while 
odd  polynomials  are  to  be  used  if  /  >1.  The  radial  collocation  points  are  r*  = 
cos  \(k  -  \)it /{2K  -  1)],  A?  =  1,  2,  •  •  •  K.  Consequently,  0  <  r*  <1,  and  no 
difficulty  can  arise  from  points  on  the  axis.  The  collocation  points  in  radial  and  axial 
direction  are  concentrated  near  the  boundary  such  that  high  resolution  in  this  region  is 
obtained  without  additional  coordinate  transformations. 


632 


Our  implementation  of  the  spectral  method  uses  precalculated  and  stored  matrices 
containing  the  values  of  the  expansion  functions  and  their  derivatives  at  the  collocation 
points.  It  is  a  straightforward  matter  to  convert  the  linear  system  of  partial  differential 
equations  derived  from  eqs.  (3)  into  an  algebraic  system  of  dimension  N  =  4  K  L  M 
for  the  coefficients  uUm ,  vklm ,  wUm,  and  p^m  for  vr)  v vit  and  pd,  respectively.  It  is 
not  straightforward,  however,  to  implement  the  homogeneous  boundary  conditions  for 
the  velocities  at  the  cylinder  wall  and  the  condition  on  the  pressure  that  is  only  deter¬ 
mined  to  within  an  additive  constant.  In  principle,  the  boundary  conditions  are  imple¬ 
mented  by  replacing  three  of  the  four  differential  equations  in  the  boundary  points.  The 
question  then  is  which  equation  should  be  retained  and  where  the  condition  on  the  pres¬ 
sure,  e.g.  pd  =  0,  should  be  applied.  Trial-and-error  leads  to  numerous  cases  with  ill- 
determined  matrices  or  zero  determinant.  In  other  cases,  a  correct  solution  for  the  velo¬ 
city  field  is  obtained,  but  the  pressure  contains  a  non-physical  spurious  term.  With  the 
velocity  field  given,  we  attempted  to  calculate  the  pressure  by  solving  a  Poisson  equation 
with  von  Neumann  boundary  conditions,  but  we  encountered  the  same  difficulties. 
Problems  with  calculating  the  pressure  in  closed  domains  with  spectral  methods  are 
well-known,  e.g.  [8].  However,  the  reports  of  negative  results  are  rather  unspecific,  and 
neither  the  origin  nor  methods  for  removal  of  this  spurious  term  seem  to  be  known. 

We  have  therefore  performed  a  detailed  analysis  of  the  flow  in  a  square  driven  by 
an  internal  force  field.  This  simpler  two-dimensional  problem  exhibits  all  characteristics 
-  including  the  spurious  pressure  term  -  of  the  original  problem.  Detailed  results  of  this 
study  will  be  reported  elsewhere  [9].  The  study  reveals  that  the  spurious  term  is  associ¬ 
ated  with  the  corners  of  the  domain.  The  term  vanishes  in  all  collocation  points  except 
the  corners,  where  it  may  assume  arbitrary  values.  The  term  can  be  suppressed  by 
retaining  in  the  corners  one  of  the  momentum  equations  that  contain  the  derivative  of 
the  pressure  in  the  direction  of  the  boundary.  In  the  cylinder  problem,  the  z~ 
momentum  must  be  retained  in  order  to  suppress  even  as  well  as  odd  spurious  terms. 
The  condition  on  the  pressure  can  be  applied  anywhere  except  in  the  corner  points. 

We  solve  the  linear  algebraic  system  for  the  expansion  coefficients  with  a  special 
subroutine  based  on  Gauss  elimination  with  partial  pivoting.  The  subroutine  stores  all 
data  required  to  solve  the  same  system  with  a  new  right-hand  side  without  repeating  the 
costly  ( 0(N 3)  operations)  reduction  of  the  matrix  to  upper  triangular  form.  Once  the 
solution  is  obtained,  a  new  right-hand  side  is  formed  taking  the  nonlinear  terms  into 
account  and  the  system  is  solved  again.  This  procedure  is  iteratively  repeated  until 
sufficient  accuracy  is  obtained.  The  procedure  is  equivalent  to  the  modified  Newton 
iteration  (without  updating  the  Jacobian  in  every  step)  and  converges  rapidly  since  the 
nonlinear  corrections  to  the  velocity  are  small  while  the  pressure  appears  linear  in  equa¬ 
tions  (3). 

0.  Results  of  the  Spectral  Code 

In  the  following,  we  present  some  preliminary  results  of  a  test  run  for  R  =  14.95, 
$  =  20®,  r  =  0.1667,  and  X  =  4.368  which  results  in  e  —  0.057.  The  results  are  for 
K  =  4,  L  =  M  =  5,  and  consequently  N  =  400.  Detailed  convergence  tests  will  be 
performed  with  later  versions  of  the  spectral  code.  Figure  1  shows  the  axial  and  radial 
velocity  in  the  x,  z -plane.  Only  the  upper  half,  z  >  0,  of  the  cylinder  is  shown;  the 
lower  half  is  governed  by  the  symmetries  (5).  The  velocity  distribution  at  z  =  0  agrees 
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well  with  the  results  of  the  perturbation  analysis  and  computations  with  the  Sandia 
code.  Near  the  walls,  the  solution  seems  to  be  more  realistic  and  more  accurate  than  the 
Sandia  results.  The  figure  also  verifies  the  existence  of  a  predominantly  axial  flow  over 
most  of  the  cylinder  length,  except  within  a  region  of  the  order  of  the  radius  near  the 
end  wall.  Linear  and  nonlinear  velocity  distributions  are  hardly  distinguishable.  Clearly 
visible  is  the  turning  of  the  flow  near  the  end  wall.  The  radial  and  azimuthal  velocities 
at  r  =  0.9X  are  shown  in  figure  2.  The  right  tick  mark  indicates  the  x  -direction, 
4>  —  0.  At  Re  =  14.95,  the  maximum  of  the  axial  velocity  occurs  at  <f>  s=w  45  *. 

Pressure  distributions  in  the  x ,  z -plane  are  given  in  figures  3  and  4  with  the  heavy 
lines  indicating  positive  values.  The  pressure  in  figure  3  is  obtained  simultaneously  with 
from  equations  linearized  in  e,  and  clearly  shows  the  symmetry  (13b).  Figure  4  gives 
the  result  from  the  nonlinear  equations.  It  is  interesting  to  note  that  a  very  similar  pres¬ 
sure  field  can  be  obtained  by  solving  the  Poisson  equation  for  the  pressure  with  the 
linear  velocity  field.  The  inhomogeneous  term  in  the  Poisson  equation  is  inherently  non¬ 
linear  in  the  velocities.  Figure  5  gives  the  pressure  distribution  across  the  cylinder  near 
the  end  wall  at  z  —  0.9X.  Remarkable  is  the  formation  of  a  high-pressure  region  in  the 
corner  near  <j>  —  0,  which  produces  a  large  moment  about  the  y-axis.  Looking  at  a  series 
of  plots  like  figures  4  and  5,  one  may  wonder  whether  the  details  of  the  pressure  varia¬ 
tion  near  the  cylinder  wall  can  be  resolved  with  a  finite  difference  approximation  with  a 
step  size  of  Ar  =0.1. 

The  azimuthal  mean  velocity  at  z  =  0  is  shown  in  figure  6.  The  shear  exerted  by 
this  component  on  the  cylinder  wall  opposes  the  spinning  motion  and  is  the  ultimate 
cause  of  the  despin  moment.  The  axial  and  radial  mean  velocity  field  is  given  in  figure 
7.  This  streaming  term  exhibits  a  toroidal  motion  stretched  over  each  half  of  the 
cylinder.  It  is  this  mean  velocity  that  causes  the  symmetric  pattern  in  flow  visualizar 
tions  [10].  Figure  8  shows  the  observed  pattern  of  the  flow  at  R  «  30  which  is  typical 
for  the  range  of  low  Reynolds  numbers. 

7.  Discussion 

The  experience  with  the  first  version  of  the  spectral  code  shows  that  high  perfor¬ 
mance  can  be  achieved.  The  reported  run  with  N  =  400  requires  1.3  minutes  CPU 
time  on  an  IBM  3090,  48  minutes  on  an  Apollo  DN300  desktop  computer.  The  solution 
is  obtained  in  semi-analytical  form  with  only  N  —  400  numerical  coefficients.  This  low 
data  volume  is  especially  attractive  for  communication  with  remote  supercomputers. 
The  code  is  very  well  suited  for  vectorization,  since  practically  all  CPU  time  is  spent  on 
constructing  and  solving  an  algebraic  system.  However,  the  code  demands  larger 
memory  than  other  codes  [3,  5].  Since  64-bit  arithmetic  is  highly  recommended  for  spec¬ 
tral  methods  in  general,  and  the  algebraic  system  requires  N(N  +  1)  words  of  storage, 
the  above  test  requires  1.3  Mbyte  of  memory.  Nowadays,  the  memory  requirement 
appears  acceptable  even  if  higher  resolution  is  desired. 

Finally,  there  are  various  ways  to  improve  the  performance  and  lessen  the 
demands.  The  first  steD  is  to  exploit  symmetry  which  ^educes  N  by  a  factor  of  1/2, 
storage  by  1/4,  and  time  by  1/8.  Second,  the  solution  process  can  be  split  into  two 
levels,  the  first  of  which  calculates  only  the  velocity  components  while  the  pressure  is 
obtained  a  posteriori  by  solving  the  Poisson  equation.  After  these  changes,  the  above 
test  run  will  require  less  than  1  minute  on  an  MC68020/68881  based  desktop  computer. 
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Alternatively,  runs  with  higher  resolution  can  be  executed  within  a  short  time  on  super¬ 
computers.  One  may  also  consider  reducing  the  storage  requirement  by  line  iteration. 
However,  the  ability  to  obtain  a  reasonably  acurate  solution  by  direct  solution  of  the 
(large)  algebraic  system  bears  valuable  potential  to  answer  the  question  whether  the 
steady  solution  is  stable,  and  allows  for  analysis  of  unsteady  motions.  The  design  of  a 
reliable  ccie  for  the  unsteady  problem  can  take  profit  from  the  kowledge  of  the  eigen¬ 
value  spectrum  for  small  unsteady  disturbances  of  the  steady  flow. 
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Figure  1.  Vector  plot  of  the  velocity  field 
in  the  x ,  z  -plane  for  z  >  0. 


Figure  2.  Vector  plot  of  the  velocity  field 
across  the  cylinder  at  z  /\  —  0.9. 


Figure  3.  Contour  plot  of  the  linear  Figure  4.  Contour  plot  of  the  nonlinear 

pressure  lie  Id  in  the  x,  2 -plane  for  pressure  field  in  the  a;,2-planc  for 

?  >  0  2  >  0. 


Figure  5.  Contour  plot  of  the  pressure  Figure  6.  Vector  plot  of  the  mean  velo- 

field  across  the  cylinder  at  z  /X  =  0.9.  city  field  across  the  cylinder  at  z  /X  =  0. 


Figure  7.  Vector  plot  of  the  mean  velo-  Figure  8.  Pattern  of  the  fluid  motion  at 

city  field  in  the  x,  z -plane  for  z  >  0.  low  Reynolds  numbers  ( R  30)  in  the 

x ,  * -plane. 
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Abstract 


A  study  is  made  of  the  temporal  and  spatial  development  of  subharmonic 
edge  packets  which  are  resonantly  excited  by  wavetralns  normally  incident 
and  reflected  on  a  mildly  sloping  beach.  The  nonlinear  evolution  equations 
of  the  wavepacket  envelopes  are  solved  numerically  for  a  variety  of  initial 
conditions.  It  is  found  that,  under  certain  conditions,  large-scale 
modulations  of  edge  waves  develop  and  undergo  a  recurrence  phenomenon. 

These  findings  support  the  view  that  large-scale  modulations  of  edge  waves 
may  account  for  certain  longshore  phenomena  observed  on  natural  beaches. 


(To  appear  in  Wave  Motion) 
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A  UNIFIED  APPROACH  TO  M\SS  PROPERTY  COMPUTATIONS 
IN  A  SOLID  MODELING  ENVIRONMENT 
WITH  APPLICATION  TO  HYDRAULIC  STRUCTURES 
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Vicksburg,  Mississippi  391S0 


ABSTRACT . 
of  Engineers  we 
hydraulic  struc 
Analysis/Design 
this.  A  genera 
capability  was 
that  volume,  we 
as  well  as  resu 
(and  sometimes 
the  development 
and  volumes  for 
computat iona 1 1 y 
describe  these 
hydraulic  struc 


Several  of  the  practicing  engineers  in  the  Corps 
re  doing  overturning  and  sliding  analyses  of 
tures  by  hand  before  a  Three-Dimensional  Stability 
(3DSAD)  computer  program  was  developed  for  doing 
1  purpose  approach  with  solid  modeling  type 
taken  for  describing  the  geometry  and  loads  so 
ight,  and  centroid  information  (mass  properties), 
Iting  forces  and  moments,  are  a  natural  biproduct 
the  more  important  product)  of  the  program.  In 
of  3DSAD  certain  algorithms  for  computing  areas 
curved  shapes  in  a  concise,  consistent,  and  very 
efficient  way  were  developed.  This  paper  will 
algorithms  and  demonstrate  their  use  for  specific 
tures  (dams,  locks,  cooling  towers,  etc.). 


I.  INTRODUCTION.  A  Three-Dimensional  Stability 
Analysis/Design  (3DSAD)  program  [1,  2,  3]  for  analyzing  and 
designing  concrete  structures  has  been  developed  and  successfully 
used.  This  program  model s  the  geometry  and  loads  in  a  general 
way  and  then  applies  this  modeling  capability  to  specific 
structure  types  when  possible.  Examples  include  dams  with 
overflow,  nonoverflow,  and  pier  sections;  gravity  and  U-frame 
locks;  cooling  towers;  etc.  If  the  structure  does  not  conform  to 
a  predefined  standard  shape,  the  program  can  still  do  a  general 
analysis  of  the  problem  because  of  the  fundamental  approach  of 
using  solid  modeling  type  capability  as  a  foundation. 


There  are  several  ways  to  model  geometry  and  many  conmercial 
packages  exist  for  this  [4].  The  approach  taken  in  3DSAD  was  to 
provide  some  of  this  capability  internal  to  the  program  and  then 
at  a  later  date  supply  the  capability  to  "hook"  3DSAD  with 
the  larger  systems. 

II.  MODELING  TECHNIQUE.  3DSAD  models  geometry  three  ways: 

.a.  Blocks  -  2-D  cross-sections  swept  normal  to  the 
cross-section  in  either  a  linear  or  ax i symme tr i c 
way  with  linear  and  quadratic  tapering  for  the 
linear  sweeps. 
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b..  Eight  node  brick  elements. 

£.  A  boundary  representation  consisting  of  planar 
faces  and  bicubic  patches. 

III.  MASS  PROPERTIES  -  BLOCK.  One  of  the  major  points  of 
this  paper  is  to  describe  the  equations  for  performing  mass 
properties  for  blocks.  The  blocks  are  formulated  by  first 
defining  a  cross-section  and  then  parametrically  doing  a  constant 
or  tapered  linear  sweep  or  an  ax  i  syrrmetr  ic  sweep.  Thus,  area 
integrals  can  first  be  computed,  and  from  these  the  mass 
properties  can  be  done  by  performing  the  integration  in  the  third 
dimension. 

For  a  cross-section  defined  by  a  polygon  it  is  best  to 
convert  the  area  integrals  to  line  integrals  [5].  For  a  cross- 
section  consisting  of  both  curved  and  straight  line  segments,  it 
is  tempting  to  first  approximate  the  curved  edges  with  straight 
line  segments  and  then  do  the  line  integrals.  However,  this 
process  is  time  consuming  and  creates  unnecessary  errors,  since 
it  is  possible  to  formulate  line  integrals  for  the  curved  edges 
as  well.  This  paper  will  now  present  examples  of  these  line 
integrals  developed  in  a  consistent  manner. 

The  volume  of  a  block  with  constant  cross-section  (Figure  1) 


Figure  1.  Blocks  with  constant  cross-section 
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having  j  sides  can  be  expressed  as  follows: 


V  =  -  L  /c  ydx 


n 

=  -  L  I 


fr  ydx 


n 

=  -  L  E 

J-l 


The  integral  I  can  now  be  evaluated  for  different  line  segment 
types. 

Linear  line  segment.  Suppose  the  line  segment  is  defined  by 
two  points  (Xj ,  Yj)  -  (x2,  Y2).  Then 

X  -  X1  +  (X2  -  X1)  s 
V  =  V,  +  (Y2  -  Y,)  S 


where  S  varies  between  0  and  1.  I  can  be  done  as  follows: 

1  =  fo  [Y1  +  (Y2  ’  Yl}  S]  (X2  ’  Xl)  dS 

1  1  k 

=  /'  A,  E  B.  SK  dS 

u  1  k=0  K 
1  Bk 

=  A.  E  L  .  1 
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Quadratic  line  segment.  In  a  similar  manner  a  quadratic 
line  segment  formed  by  three  points  1,  2,  and  3  can  be  expressed 
as  follows: 


X  =  [x,  x2  x3]  0 


*  Z  A.S 
k=0  * 


Y  =  z  B.  Sl 
L=0  L 


1  Ic 

dx  *  z  (k  +  1)  ,  SK  dS 

k=0 


Here  M  is  the  "magic"  matrix  of  known  constants,  and  as  in  the 
linear  case,  the  A's  and  B's  are  constants  of  the  parametric 
polynomials.  The  line  integral  I  can  now  be  easily  evaluated  as 
fo 1  lows: 


I  = 


2  , 

(  I  B,  SL) 
L=0  L 


1  k 

(  E  (k  +  1)  A.  ,  SK)  dS 

k=0  K 


2  1 

=  z  z 

L=0  k=0 


k  +  1 

FTTTT 


'k  +  1  dL 
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Note  also  at  this  point  that  higher  order  integrals,  say  for  the 
x  centroid,  can  be  done  with  equal  ease. 


X  =  -  y  /c  XYdX 


k+1 


2  2  1 

1  =  L=0  M=0  k=0  ****  "L  “M  "k+1 


A,  B,.  A, 


The  general  trend  is  now  set  with  the  form  bei"><  the  same  for  any 
polynomial.  Also,  another  x  or  y  inside  the  integral  simply 


results  in  another  summation  sibn. 


Circular  arc.  The  circular  arc  can  also  be  put  in  the  same 
consistent  notation  with  use  of  complex  variables  as  follows: 

X  =  R  cos  (AS  +  0Q) 


Y  =  R  sin  (AS  +  0Q) 


W  =  e 


iAS 


X  = 


1 

e 

k=-l 


E  Ak  W' 


Y  = 


1 

E 

L=-l 


E  Bl  Wl 


1  k 

dX  =  1A  E  K  A.  W*  dS 
K=-l  K 
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With  this  foundation  the  line  integrals  can  be  done  as  easily  as 
the  polynomials. 

1  1  I  1  i/ 

I  =  /  (  I  B.  WL)  (1A  Z  K  A.  WK)  dS 

0  L=-l  L  K=-l  K 

-l  l  K  (eiA(k+l)_,) 

L=-l  K=-l  K  L 

Elliptical  ar c .  An  elliptical  arc  can  also  be  handled  in 
the  same  consistent  manner. 


X  =  A  cos  (OS  +  0Q) 


Y  =  B  sin  (QS  +  0Q) 


X  =  z  A„  W 
K=-l  K 


Y  =  Z  B.  WL 
L=-l  L 


Tapered  block.  Consider  now  a  cross-section  defined  in  the 
x-y  plane.  Let  (xq,  yQ )  be  a  point  on  that  cross-section.  The 

(x,  y)  value  of  a  point  on  a  cross-section  at  elevation  z  as  a 
result  of  a  tape  r  i s 


x(z)  =  <x0  -  xA)  Sx  (z)  ♦  xA 


y(2)  =  (y0  -  Sy  (z)  *  yA 
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where  (x^,  y^)  is  an  apex  point,  and  S  x  ( z )  and  Sy(z)  arc  scale 

factors  that  range  in  value  from  zero  to  one.  The  scale  factor 
functions  determine  the  type  of  tapering.  Let  z  vary  from  zero 
to  L  and  let  the  scale  factor  functions  vary  quadrat ical ly  as 
foil ows : 


Z  =  L  S 


Sx  -  [’  Hx  Fx]  = 


=  v  A. 
j=0  J 


s 


y 


2 

=  E 

k=0 


where  Hx  and  Fx  are  the  values  of  the  x  scale  factor, 

respectively,  for  s=.5  and  s= 1 .  Let  a  similar  expression  for 
the  y  scale  factor  also  be  defined.  Let  A  be  the  area  of  a 
cross-section  at  elevation  z.  Then  the  volume  is  computed  by 


A  =  Aq  Sx  Sy 


V  = 


r0  A  dZ 


1  2  i  2  k 

=  Aq /q  {ln  Aj  SJ)  (  E^  Bk  SK)  L  dS 


j=0 

2  2 
=  An  L  E  E 
u  j=0  k=0 


k=0 
1 


j+k+1  Aj  Bk 
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IV.  MASS  PROPERTIES  -  FACES.  Two  types  of  faces  will  be 
cons idered: 

.a.  Planar  face. 

J}.  Bicubic  patch. 

Planar  face.  By  using  the  equation  of  the  plane,  the  mass 
property  integrals  for  planar  faces  can  also  be  converted  to  line 
integrals  and  then  computed  the  same  way  as  before.  For  example, 
the  volume  under  a  planar  face  can  be  computed  as  follows: 

V  =  /A  z  dx  dy 
=  (Ax  +  By  +  C)  dx  dy 

t 

=  -/c  (Ax  +  . 5By  +  C)  ydx 


These  volumes  when  surrmed  over  all  the  faces  will  give  the 
correct  volume  for  the  solid. 


Bicubic  patch.  Bicubic  patches  can  be  defined  in  several 
ways.  Whatever  the  boundary  conditions  or  formulation,  they  can 
typically  be  cast  as  follows: 


x 


y 


z 


33  1  j 

E  E  A,i  S  tJ 

i*0  j-0  1J 


33  k  1 

E  E  B.,  SK  t' 

k*0  1=0  Kl 


3  3  m  n 

T.  E  C _  S  t 


m=0  n=0 


mn 
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Mass  properties  can  be  done  with  numerical  integration  or 
exactly.  The  exact  formulation  is 


V  =  /A  z  dx  dy 


=  Z1  f]  z  | Jl  dS  dt 
0  0 


3  3 

=  z  z 
1=0  j=0 


3  3  3  3 

z  z  z  z 

k=0  1=0  m=0  n=0  ljklmn 


Tijklmn 


W-JWj.l  Bkl  Snn 
(i+k+m)  (j+l+n) 


V .  LOADS .  Loads  are  presently  performed  using  point  loads 
and  volumes  with  assigned  directions  (called  "pressure- 
volumes").  The  preferred  method  is  to  integrate  the  pressures 
over  the  surfaces  [6]  as  follows: 


F  =  -/s  pdS 

->•  -¥  -V 

M  =  -/jT  x  pdS 


Here  F  is  the  force,  M  is  the  moment,  and  p  is  the  pressure  for  a 
given  face. 
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V 1  .  EXAMPLES .  Shown  below  are  a  few  examples  of  the  use  of 
the  geometry  building  features  described  in  this  paper. 

\ 


an  efTTcien?7TSns^entTway,r^fh0d5  UlUt  ,iavc  bt’cn  Presented 
solids  The  program  JDS  AO  inrornft°,^^tln8i.,Ta5s  properties  of 

proven  very  useful  and  success fu  1  *  to  *  the  Vl6’!  *J;chniSucs  has 
Engineers.  successtui  to  the  U.  S.  Army  Corps  of 
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A  COMMONSENSE  THEORY  OF  NONMONOTONICITY 
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ABSTRACT 

A  commonsense  theory  of  nonmonotonic  reasoning  is  presented  which  models  our  intuitive  ability  to  reason 
about  defaults.  The  concepts  of  this  theory  do  not  involve  mathematical  fixed  points,  but  instead  are  explicitly  de¬ 
fined  in  a  monotonic  modal  quantificational  logic  which  captures  the  modal  notion  of  logical  truth.  The  axioms  and 
inference  rules  of  this  modal  logic  are  described  therein  along  with  the  derivation  of  the  basic  theorems  about  nonmo¬ 
notonic  reasoning.  A  comparison  to  the  "fixed  point"  theories  of  nonmonotonicity  is  made  and  some  simple  applica¬ 
tions  to  deontic  nonmonotonic  reasoning  and  the  frame  problem  in  robot  plan  formation  are  presented. 


"For  this  formula  game  is  carried  out  according  to  certain  definite  rules,  in  which  the  technique  of  our 
thinking  is  expressed.  These  rules  form  a  closed  system  that  can  be  discovered  and  definitively  stated. 
The  fundamental  idea  of  my  proof  theory  is  none  other  than  to  describe  the  activity  of  our  understanding, 
to  make  a  protocol  of  the  rules  according  to  which  our  thinking  actually  proceeds.” 

[David  Hilbert  1927] 


1.  INTRODUCTION 

The  literature  on  the  nature  and  representation  of  nonmonotonicity  is  full  of  disputes  and  contradictory  theories. 
This  is  surprising  because  the  nature  of  nonmonotonic  reasoning  does  not  cause  any  worry  for  people  in  their  every¬ 
day  coping  with  the  world.  For  example,  without  any  worry  people  are  quite  happy  to  assert  the  default  that  birds 
fly  even  though  penguins  are  birds  which  do  not  fly.  This  suggests  that  there  is  some  form  of  commonsense  knowl¬ 
edge  about  nonmonotonicity  that  is  rich  enough  to  enable  people  to  deal  with  the  world  and  which  is  universal 
enough  to  enable  cooperation  and  communication  between  people.  In  this  paper  we  propose  such  a  theory. 

The  basic  idea  of  our  theory  of  nonmonotonicity  is  that  nonmonotonicity  is  already  encompassed  in  the  noi  mal 
intentional  logic  of  everyday  commonsense  reasoning  and  can  be  explained  precisely  in  that  terminology. 

For  example,  a  knowledgebase  consisting  of  a  simple  default  axiom  expressing  that  a  particular  bird  flies  when¬ 
ever  that  bird  flies  is  possible  with  respect  to  what  is  assumed  is  stated  as: 

(that  which  is  assumed  is 

(if  (A  is  possible  with  respect  to  what  is  assumed)  then  A)) 

where  A  stands  for  the  proposition  that  that  particular  bird  flies. 

Reflection  on  the  meaning  of  this  knowledgebase  leads  immediately  to  the  conclusion  that  either  A  is  logically 
possible  and  the  knowledgebase  is  synonymous  to  A,  or  A  is  not  logically  possible  and  the  knowledgebase  is  synon¬ 
ymous  to  logical  truth.  This  conclusion  is  obtained  by  simple  case  analysis:  for  if  A  is  possible  with  respect  to 
what  is  assumed  then,  since  truth  implies  A  is  A,  that  which  is  assumed  is  indeed  A.  Since  that  which  is  assumed  is 
A,  A  is  possible  with  respect  to  what  is  assumed  only  if  A  is  logically  possible.  On  the  other  hand,  if  A  is  not 
possible  with  respect  to  what  is  assumed  only  if  A  is  not  logically  possible. 

Thus  if  it  is  further  assumed  that  A  is  logically  possible,  then  it  follows  that  the  knowledgebase  is  synonymous 
to  A  itself. 

The  nonmonotonic  nature  of  these  expressions  becomes  apparent  if  an  additional  proposition  that  that  particular 
bird  does  not  fly  is  added  to  the  knowledgebase: 

(that  which  is  assumed  is 
(and  (not  A) 

(if  (A  is  possible  with  respect  to  what  is  assumed)  then  A))) 

Reflection  on  this  new  knowledgebase  leads  immediately  to  the  conclusion  that  it  is  synonymous  to  not  A.  This 
conclusion  is  again  obtained  by  simple  case  analysis:  for  if  A  is  possible  with  respect  to  what  is  assumed  then, 
since  truth  implies  A  is  just  A,  that  which  is  assumed  is  indeed  not  A  and  A  which  is  falsity.  Since  that  which  is 
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assumed  is  falsity  is  logically  possible  which  it  is  not  Thus  A  b  not  possible  with  respect  to  what  b  assumed.  On 
die  other  hand,  if  A  is  not  possible  with  respect  to  what  is  assumed  then,  since  falsity  implies  A  is  just  truth,  that 
which  b  assumed  is  just  not  A.  Since  that  which  is  assumed  b  (not  A),  A  is  not  possible  with  respect  to  what  is 
assumed  only  if  A  and  (not  A)  is  not  logically  possible  which  b  the  case.  Thus  it  follows  that  the  knowledgebase  is 
synonymous  to  (NOT  A). 

Therefore  whereas  the  original  knowledgebase  was  synonymous  to  A  the  new  knowledgebase,  obtained  by  adding 
(not  A),  is  synonymous,  not  to  fabity,  but  to  (not  A)  itself. 

These  simple  intuitive  nonmonotonic  arguments  involve  logical  concepts  such  as  not,  implies,  truth,  falsity, 
logical  possibility,  possibility  with  respect  to  some  assumed  knowledgebase,  and  synonymity  to  a  knowledgebase. 
The  concepts:  not,  implies,  truth(i.e.  T),  and  falsity(i.e.  NIL)  are  all  concepts  of  (extensions!)  quantifications!  logic 
and  are  well  known.  The  remaining  concepts:  logical  possibility,  possibility  with  respect  to  something,  and  syn¬ 
onymity  of  two  things  can  be  defined  in  a  very  simple  modal  logic  extension  of  quantifications]  logic,  which  we  call 
Z[Brownl.2,3,4,].  The  axiomatization  of  the  modal  logic  Z  is  described  in  detail  in  section  2.  But  briefly,  it  con¬ 
sists  of  (extensional)  quantificational  logic  plus  the  intentional  concept  of  something  being  logically  true  written  as 
the  unary  predicate:  (LT  P).  The  concept  of  a  proposition  P  being  logically  possible  and  the  concept  of  two  proposi¬ 
tions  being  synonymous  are  then  defined  as: 

(POS  P)  -  (NOT(LT(NOT  P)))  J>  b  logically  possible 
(SYNPQ)-(LT(IFFPQ))  b  synonymous  to  Q 

The  itvwelmradadgriwre  and  ggtmicnts  cm  tie  fewmalnari  in  dwmmmimmnimn  mnrlil  logic  7  gufre  hy  SOTTf 

letter  such  as  K  stand  for  the  knowledgebase  under  discussion.  The  idiom  "that  which  b  assumed  b  X"  can  then  be 
rendered  to  say  that  K  b  synonymous  to  X,  and  the  idiom  ’X  b  possible  with  respect  to  what  b  assumed"  can  be 
rendered  to  say  that  K  and  X  b  possible: 

(that  which  b  assumed  is  X)  -  (SYN  K  X) 

(X  b  possible  with  respect  to  what  b  assumed)  -  (POS(AND  K  X)) 

These  two  idioms  are  indexial  symbols  referring  implicitly  to  some  particular  knowledgebase  K  under  discussion. 

Thu  knowledgebase  referenced  by  the  (X  b  possible  with  respect  to  what  b  assumed)  idiom  b  always  the  meaning  of 
the  symbol  generated  by  the  enclosing  (that  which  b  assumed  b  X)  idiom.  Each  occurrence  of  the  (that  which  b 
assumed  b  X)  idiom  always  generates  a  symbol(unique  to  the  theory  being  discussed)  to  stand  for  the  database  under 
discussion. 

The  first  knowledgebase  b  then  expressed  as: 

(SYN  K(IMPLY(POS(AND  K  A))A)) 

Its  commonsense  argument  could  be  carried  out  in  the  following  steps: 

(IF  (POS(AND  K  A)) 

(SYN  K(IMFLY  T  A)) 

SYN  K(IMPLY  NIL  A))  ) 

(IF  (POS(ANDK  A)) 

(SYN  K  A) 

(SYNKT)) 

(OR  (AND  (POS(AND  K  A))  (SYN  K  A)) 

(AND  (NOT  (POS(AND  K  A)))  (SYN  K  T)) ) 

(OR  (AND  (POS(AND  A  A))  (SYN  K  A)) 

(AND  (NOT  (POS(  AND  T  A)))  (SYN  K  T)) ) 

(OR  (AND  (POS  A)  (SYN  K  A)) 

(AND  (NOT  (POS  A))  (SYN  KT)) ) 

by  equality  substitution  using  the  following  derived  rales  of  inference  of  the  modal  quantificational  logic  Z. 

(g(POS  P)HIF(POS  PXg  TXg  NIL)) 

(IMPLY  T  A)- A 
(IMPLY  NIL  A)-T 

(IFPLRMOR(ANDPLXAND(NOTP)R)) 
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(AND(P  X  Y)(SYN  X  Y))=(AND(P  Y  Y)(SYN  X  Y)) 

(AND  A  A)-A 
(AND  T  A)- A 

Furthermore  if  A  is  logically  possible:  (POS  A)  then  furihcr  simplification  using  laws  about  AND  and  OR  yields  the 
fact  that  the  knowledgebase  K  is  synonymous  to  A: 

(OR  (AND  T  (SYN  K  A)) 

(AND  NIL  (SYN  K  T)) ) 

(SYN  K  A) 

The  second  knowledgebase  is  then  expressed  as: 

(SYN  K(AND(NOT  A)(IMPLY(POS  K  A)A))) 


Its  commonsense  argument  could  be  carried  out  in  the  following  steps: 

(IF  (POS(AND  K  A)) 

(SYN  K(AND(NOT  A)(IMPLY  T  A))) 

(SYN  K(AND(NOT  A)(IMPLY  NIL  A))) ) 

(IF  (POS(AND  K  A)) 

(SYN  K(AND(NOT  A)A)) 

(SYN  K(AND(NOT  A)T)) ) 

(IF  (POS(AND  K  A)) 

(SYN  K  NIL) 

(SYN  K(NOT  A)) ) 

(OR  (AND  (POS(AND  K  A))  (SYN  K  NIL)) 

(AND  (NOT  (POS(AND  K  A)))  (SYN  K(NOT  A))) ) 

(OR  (AND  (POS(AND  NIL  A))  (SYN  K  NIL)) 

(AND  (NOT  (POS(AND(NOT  A)A)))  (SYN  K(NOT  A))) ) 

(OR  (AND  (POS  NIL)  (SYN  K  NIL)) 

(AND  (NOT  (POS  NIL))  (SYN  K(NOT  A))) ) 

(OR  (AND  NIL  (SYN  K  NIL)) 

(AND  (NOT  NIL)  (SYN  K(NOT  A))) ) 

(OR  NIL 

(AND  T  (SYN  K(NOT  A))) ) 

(SYN  K(NOT  A)) 


'these  knowledgebases  have  been  expressed  solely  in  terms  of  the  modal  quantificational  logic  Z.  In  particular,  the 
nonmonotonic  concepts  were  explicitly  defined  in  this  logic.  The  intuitive  arguments  about  the  meaning  of  these 
nonmonotonic  knowledgebases  have  been  carried  out  solely  in  the  modal  quantificational  logic  Z.  Most  importantly, 
our  commonsense  understanding  and  reasoning  about  nonmonotonicity  is  directly  represented  by  the  inference  steps 
of  this  formal  theory.  Therefore,  it  is  clear  that  nonmonotonic  reasoning  needs  no  special  axioms  or  rules  of  infer¬ 
ence  because  it  is  already  inherent  in  the  normal  intentional  logic  of  everyday  commonsense  reasoning  as  modeled  by 
the  modal  quantificational  logic  Z. 

The  modal  quantificational  logic  Z  is  described  in  section  2.  This  is  followed  in  section  3  by  the  presentation  of 
the  basic  theorems  of  our  nonmonotonic  theory.  Section  4  compares  our  theory  of  nonmonotonicity  with  a  number 
of  other  theories  which  have  appeared  in  the  literature.  More  complex  examples  of  nonmonotonic  reasoning  are 
given  in  section  S.  And  finally,  a  few  conclusions  are  drawn  in  section  6. 
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2.  THE  MODAL  QUANTIFICATIONAL  LOGIC:  Z 


Our  theory  of  commonsense  intentional  reasoning  is  a  simple  modal  logic[Lewis]  that  captures  the  notion  of  log¬ 
ical  truth.  The  symbols  of  this  modal  logic  consist  of  the  symbols  of  (extensional)  quantificational  logic  plus  the 
primitive  modal  symbolism:  (LT  p)  which  is  truth  whenever  the  proposition  p  is  logically  true.  Propositions  are  in¬ 
tuitively  the  meanings  of  sentences.  For  example,  the  sentences:  '(IMPLY  p  q)  and  ’(OR(NOT  p)q)  both  mean  that  p 
implies  q.  Thus,  although  these  two  sentences  are  different,  the  two  propositions:  (IMPLY  p  q)  and  (OR(NOT  p)q) 
ate  the  same.  Propositions  may  be  true  or  false  in  a  given  world,  but  with  the  exception  of  the  true  proposition(i.e. 
the  meaning  of  '(IMPLY  p  p))  and  the  false  proposition(i.e.  the  meaning  of  '(AND  p(NOT  p))),  propositions  are  not 
inherently  true  or  false.  Thus  mathematically,  propositions  may  be  thought  of  as  being  the  elements  of  a  complete 
atomic  Boolean  algebra  with  an  arbitrary(possib!y  infinite)  number  of  generators.  The  introduction  of  propositions 
as  objects  of  reasoning  is  not  an  unreasonable  thing  to  do,  but  instead  should  be  viewed  in  the  general  context  of  ex¬ 
tending  mathematics  with  ideal  entities  such  as  irrational  numbers,  complex  numbers,  and  infinitesimals  [Robinson]. 
[Hilbert]  points  out  that  the  introduction  of  such  ideal  entities  is  an  important  technique  of  mathematical  reasoning 
limited  only  by  the  need  of  consistency.  In  fact  the  introduction  of  truthvalues  for  the  nonfinistic  sentences  of  first 
order  quantificational  logic  is  in  itself  an  extension  of  traditional  mathematical  reasoning  with  ideal  entities. 

The  axioms  and  inference  rules  of  this  modal  logic  include  the  axioms  and  inference  rules  of  (extensional) 
quantificational  logic  similar  to  that  used  by  Frege  in  Begriffsschrift[Frege],  plus  the  following  inference  rule  and 
axioms  about  the  concept  of  logical  truth. 

The  Modal  Logic  Z 

RO:  from  p  infer  (LT  p) 

A 1 :  (IMPLY (LT  P)  P) 

A2:  (IMPLY(LT(IMPLY  P  Q))  (IMPLY(LT  P)(LT  Q))) 

A3:  (OR(LT  P)  (LT(NOT(LT  P)))) 

A4:  (IMPLY (ALL  Q(IMPLY (WORLD  Q)(LT(IMPLY  Q  P))))  (LT  P)) 

A5:  (ALL  S(POS(meaning  of  the  generator  subset  S))) 

The  inference  rule  RO  means  that  p  is  logically  true  may  be  inferred  from  the  assertion  of  p  to  implicidy  be  logi¬ 
cally  true.  The  consequence  of  this  rule  is  that  a  proposition  P  may  be  asserted  to  be  logically  true  by  writing  just: 

P 

and  that  a  proposition  P  is  asserted  to  be  true  in  a  particular  world  or  state  of  affairs  W  by  writing: 

(LT(IMPLY  W  P)) 

The  axiom  A 1  means  that  if  P  is  logically  true  then  P.  Axiom  A2  means  that  if  it  is  logically  true  that  P  implies  Q 
then  if  P  is  logically  true  then  Q  is  logically  true.  Axiom  A3  means  that  P  is  logically  true  or  it  is  logically  true 
that  P  is  not  logically  true.  The  inference  rule  RO  and  the  axioms  Al,  A2  and  A3  constitute  an  S5  modal  logic.  A 
good  introduction  to  modal  logic  in  general  and  in  particular  to  the  properties  of  the  S5  modal  logic  is  given  in 
[Hughes  and  Cresswell].  Minor  variations  of  the  axioms  Al,  A2,  and  A3  were  shown  in  [Carnap]  to  hold  for  the 
modal  concept  of  logical  truth.  We  believe  that  the  additional  axioms,  namely  A4  and  A5,  are  needed  in  order  to  pre¬ 
cisely  capture  the  notion  of  logical  truth.  One  important  theorem  scheme  of  S5  is:  (IFF(ALL  X(LT(p  X)))(LT(ALL 
X(p  X))))  (see[Hughes  and  Cresswell])  which  shows  that  a  property  p  is  logically  true  for  everything  iff  it  is  logically 
true  that  for  everything  p  holds.  The  consequence  of  allowing  quantification  through  modal  contexts  [Marcus]  such 
as  in  (ALL  X(LT(p  X)))  is  that  the  meanings  of  the  expressions  substituted  for  variables  are  concepts  of  objects  and 
not  the  objects  themselves.  However,  as  [Carnap]  explains,  in  a  most  precise  manner,  this  does  not  mean  that  the 
real  objects  of  the  world  are  never  denoted  by  such  expressions  because  in  a  world  a  concept  of  an  object  is  equivalent 
to  every  other  concept  of  that  object,  and  thus  all  such  concepts  then  denote  that  object  Thus,  as  shown  in[Camap] 
there  is  no  fundamental  problem  with  quantifying  through  modal  contexts.  For  example,  the  often  quoted  example 
oflQuine],  criticizing  quantified  modal  logic,  that:  (EQUAL  A(THE  X(AND  P(EQUAL  X  A))))  which  he  believes 
should  necessarily  be  true  when  P  is  true,  is,  as  one  would  suspect  in  our'system  not  logically  true,  but  merely  true 
in  any  world  in  which  P  is  true.  It  should  be  noted  that  this  sentence  is  not  even  always  true  in  extensional  logic  for 
it  is  equivalent  to:  (EQUAL  A(THE  X  NIL))in  any  world  in  which  P  is  false(i.e.  NIL). 

The  axiom  A4  states  that  a  proposition  is  logically  true  if  it  is  true  in  all  worlds.  Thus  it  expresses  the  contra- 
positive  of  Leibniz's  intuition  that  something  is  logically  true  only  if  it  is  true  in  all  worlds:  "The  truth  of  these 
(necessary  propositions]  is  eternal;  not  only  will  they  hold  whilst  the  world  remains  but  they  would  have  held  even  if 
God  had  created  the  world  in  another  way.”(Leibniz2]  We  therefore  call  this  axiom  Leibniz's  world  axiom.  We  say 
that  a  proposition  P  is  a  world  iff  P  is  possible  and  P  is  complete,  that  P  is  complete  iff  for  all  Q,  P  deter-  mines  Q, 
that  P  determines  Q  iff  P  entails  Q  or  P  entails  not  Q,  that  P  entails  Q  iff  it  is  logically  true  that  P  implies  Q,  and 
that  P  is  possible  iff  it  is  not  the  case  that  not  P  is  logically  true.  These  definitions  are  give  below: 
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(WORLD  P)  -df  ( AND(POS  P)(COMPLETE  P))  ;P  is  a  world 

(COMPLETE  P)  -df  (ALL  Q(DET  P  Q))  ;P  is  complete 

(DET  P  Q)  -df  (OR(ENTAIL  P  QXENTAIL  P(NOT  Q)))  ;P  determines  Q 

(ENTAIL  P  Q)  -df  (LT(1MPLY  P  Q))  ;P  enuils  Q 

(POS  P)  -df  (NOT(LT(NOT  P)))  ;P  is  possible 

Thus  a  world  is  a  possible  proposition  which  for  every  proposition  entails  it  or  its  negation.  Axiom  A4  therefore 
eliminates  from  the  interpretations  of  the  modal  logic  Z  those  complete  Boolean  algebras  which  are  not  atomic.  This 
axiom  has  been  used  by  a  number  of  authors  in  developing  modal  logics  in  particular  [Prior, Brownl, 2, 3.4], 

From  the  standpoint  of  Kripke  semantics[Kripke]  it  may  seem  strange  that  we  define  worlds  in  terms  of  logical 
truth:  LT,  instead  of  the  defining  logical  truth  in  terms  of  worlds.  The  reason  we  do  this  is  that  logical  truth  is  a 
more  intuitively  primitive  concept  than  the  concept  of  a  world  as[Rescher]  points  out:  "The  crucial  advantage  of  this 
procedure  is  an  epistomological  one:  we  know  reasonably  well  how  to  get  a  logic  so  as  to  be  able  to  go  on  from 
there  by  constructive  means,  but  we  have  no  intellectual  intuition  to  provide  us  with  direct,  non-constructive  access 
to  a  realm  of  possible  worlds(nor  is  there  any  deux  ex  machina  to  waft  us  thither).”  Thus  for  us  as  is  for  [Rescher]: 
"Necessary  truths  from  this  standpoint,  are  not  necessary  because  they  are  'true  in  all  possible  worlds';  au  contraire, 
possible  worlds  are  so--  i.e.  are  possible-  because  they  do  not  conflict  with  truths  that  qualify  as  necessary  on 
independent  grounds". 

Th '  rse  of  propositional  quantifiers  such  as  (All  Q„.)  in  Leibniz's  axiom:  A4  and  in  the  definition  of  COM¬ 
PLETE  is  of  course  nothing  new;  as  propositional  quantifiers  have  been  an  enduring  feature  of  both  (intentional) 
quantificational  logics  such  as  [Carnap]  and  of  (extensional)  quantificational  logics  beginning  with  Frege's  discovery 
of  quantificational  logic  in  Begriffschrift[Frege],  and  continuing  in  the  great  Polish  works  on  logic  such  Lesniewski’s 
protothetic[Lesniewski],  in  all  higher  order  logics  as  the  zero  i.tty  variables  for  the  zero  arity  verbs  of  the  logic,  and 
in  Morse's  set  theory[Moise].  The  underlying  (extensional)quantiricationa)  logic  of  our  modal  logic  Z  may  either 
treat  propositions  as  a  separate  sort  thus  requiring  a  sorted  logic,  or  they  may  treat  propositions  as  being  normal  ob¬ 
jects  by  giving  a  propositional  interpretation  for  every  object  such  as  for  example  in  the  manner  in  which  LISP's 
logical  functions  interpret  every  atom  except  NIL  as  being  true[McCarthy  I  ].  In  either  case,  it  is  important  to  note 
that  all  the  normal  laws  of  extensional  quantificational  logic,  including  the  laws  for  substitution  of  quantified  vari¬ 
ables,  also  hold  for  any  complete  atomic  Boolean  algebra,  and  therefore  are  compatible  with  the  axioms  of  the  modal 
logic  Z.  Thus  the  assumption  made  in  model  theoretic  semantics  that  extensional  logics  inherently  involve  only  the 
simplest  Boolean  algebra  consisting  of  the  two  propositions  truth:T  and  falsity:N!L  is  incorrect.  Therefore,  we  do 
not  say  that  a  proposition  is  inherently  true  or  false,  but  say  instead  that  a  proposition  is  true  or  false  in  some  world. 
Thus  if  W  is  a  world  and  P  is  a  proposition  then  we  say  that  P  is  true  in  W  iff  W  entails  P,  and  that  P  is  false  in  W 
iff  W  entails  not  P: 

(IS-TRUE-1N  W  P)  -df  (ENTAIL  W  P) 

(IS-FALSE-IN  W  P)  -df  (ENTAIL  W(NOT  P)) 

If  we  do  not  wish  to  speak  about  a  world  when  speaking  about  a  proposition  it  is  necessary  to  divide  the  propositions 
into  3  disjoint  groups  as  did  Leibniz  in  his  1686  essay  on  'Necessary  and  Contingent  Truths"  [Leibniz]  where  he 
wrote:  "That  which  lacks  such  necessity  I  call  CONTINGENT,  but  that  which  implies  a  contradiction,  or  whose  op¬ 
posite  is  NECESSARY,  is  called  IMPOSSIBLE.  The  rest  are  called  possible."  Thus,  Leibniz  divides  propositions 
into  three  categories:  those  which  are  necessary,  those  which  are  contingent,  those  which  are  impossible  or  contradic¬ 
tory.  These  propositions  may  be  interpreted  as  being  essentially  the  elements  of  a  complete  atomic  Boolean  algebra. 
We  say  that  P  is  necessary  iff  P  is  logically  true,  that  P  is  impossible(i.e.  logically  false)  iff  not  P  is  logically  true, 
and  that  P  is  contingent  iff  P  is  not  logically  true  and  P  is  not  logically  false: 

(NECESSARY  P)  -df  (LT  P)  ;P  is  necessary 

(CONTINGENT  P)  -df  (AND(NOT(LT  P)XNOT(LT(NOT  P))))  ;P  is  contingent 

(IMPOSSIBLE  P)  -df  (LT(NOT  P))  ;P  is  impossible 

The  axiom  AS  states  that  the  meaning  of  every  conjunction  of  the  generated  contingent  propositions  or  their  ne¬ 
gations  is  possible.  We  call  this  axiom  "The  Axiom  of  the  Possibility  of  Contingent  facts"  or  simply  the  "Possibi¬ 
lity  Axiom".  The  need  for  this  axiom  follows  from  the  fact  that  the  other  axioms  of  the  modal  logic  do  not  imply 
certain  elementary  facts  about  the  possibility  of  conjunctions  of  distinct  possibly  negated  atomic  expressions  consist¬ 
ing  of  nonlogical  symbols.  For  example,  if  we  have  a  theory  formulated  in  our  modal  logic  which  contains  the  non- 
logical  atomic  expression  (ON  A  B)  then  since  (ON  A  B)  is  not  logically  true,  it  follows  that  (NOT(ON  A  B))  must 
be  possible.  Yet  (POS(NOT(ON  A  B)))  does  not  follow  from  these  other  axioms.  Likewise,  since  (NOT(ON  A  B)) 
fa  not  logically  true  (ON  A  B)  must  be  possible.  Yet  (POS(ON  A  B))  does  not  follow  from  the  other  axioms.  Thus 
these  contingent  propositions  (ON  A  B)  and  (NOT(ON  A  B»  need  to  be  asserted  to  be  possible.  There  are  a  number 
of  ways  in  which  this  may  be  done  and  these  ways  essentially  correspond  to  different  ways  the  idiom:  (P  is  a  mean¬ 
ing  combination  of  the  generators)  may  be  rendered.  In  this  paper  we  have  chosen  a  general  method  which  is  applica¬ 
ble  to  just  about  any  contingent  theory  one  wishes.  This  rendering  is  given  below: 
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(meaning  of  the  generator  subset  S)  -df 

(ALL  G(IMPLY(GENERATORS  G) 

(IFF(S  G)(GMEAN1NG  G)) )) 

(GMEANING  *(p  .X1....XN))  -df  (p(GMEANlNG  X1)...(GMEAN1NG  XN)) 
for  every  contingent  symbol  p  of  arity  n. 

(GENERATORS)  -df  (LAMBDA(A)(A  is  a  contingent  variable-free  simple  sentence)) 

We  say  that  the  meaning  of  the  generator  subset  S  is  the  conjunction  of  the  GMEANINGs  of  every  generator  in  S 
and  the  negation  of  the  GMEANINGS  of  all  the  generators  not  in  S.  The  generator  meaning  of  any  expression  be¬ 
ginning  with  a  contingent  symbol  'p  is  p  of  the  GMEANING  of  its  arguments.  The  generators  are  simply  any  con¬ 
tingent  variable-free  atomic  sentences  we  wish  to  use.  The  GMEANINGS  of  the  generators  may  be  interpreted  essen¬ 
tially  as  being  the  generators  of  a  complete  atomic  Boolean  algebra.  Thus  if  there  are  N  generators  then  there  will  be 
(EXP  2(EXP  2  N))  propositions. 


For  example,  a  contingent  language  with  a  single  contingent  propositional  function  'P  and  names  ’A  and  *B  gives 
rise  to  two  contingent  generators:  ’(P  A)  and  '(P  B).  The  GENERATORS  and  GMEANING  functions  for  this  lan¬ 
guage  are  defined  as: 

(GENERATORS)  -df  {'(P  A)  '(P  B)} 

{Pl...Pn}  -df  (LAMBDA(X)(OR(EQUAL  X  P1)...(EQUAL  X  Pn))) 

(GMEANING  ‘(P  ,X))  -  (P  (GMEANING  X)) 

(GMEANING  ’A)  -  A 
(GMEANING  H)-  B 


and  the  Possibility  Axiom  simplifies  as  follows: 


(ALL  S(POS(meaning  of  the  generator  subset  S))) 

(ALL  S(POS(ALL  G(IMPLY(GENER  ATORS  G) 

(IFF(S  GXGMEANING  G)) )))) 

(ALL  S(POS(ALL  G(IMPLY({'(P  A)  '(P  B)}G) 

(IFF(S  GXGMEANING  G)) )))) 

(ALL  S(POS(ALL  G(IMPLY((LAMBDA(XXOR(EQUAL  X  l(P  A))(EQUAL  X  '(P  B))))G) 

(IFF(S  GXGMEANING  G)) )))) 

(ALL  S(POS(ALL  G(IMPLY(OR(EQUAL  G  ’(P  A))(EQUAL  G  ’(P  B))) 

(IFF(S  GXGMEANING  G)) )))) 

(ALL  S(POS(ALL  G(AND(IMPLY(EQUAL  G  '(P  A))  (IFF(S  GXGMEANING  G))) 
(IMPLY(EQUAL  G  '(P  B))  (IFF(S  GXGMEANING  G))) )))) 

(ALL  S(POS(AND(ALL  G(CMPLY(EQUAL  G  '(P  A))  (IFF(S  GXGMEANING  G)))) 

(ALL  G(IMPLY(EQUAL  G  '(P  B))  (IFF(S  GXGMEANING  G)))) ))) 
(ALL  S(POS(AND(IFF(S  '(P  A)XGMEAN1NG  '(P  A))) 

(IFF(S  (P  B)XGMEAN1NG  '(P  B))) ))) 

(ALL  S(POS(AND(IFF(S  ’(P  A))(P(GMEAN1NG  'A))) 

(IFF(S  ’(P  B)XP(GMEANING  ’B))) ))) 

(ALL  S(POS(AND(IFF(S  (P  A))(P  A)) 

(IFF(S  '(P  B)XP  B)) ))) 

(ALL  S(POS(AND(IFF(S  (P  A))(P  A)) 

(IFF(S(PB))(PB))))) 

(AND(POS(AND(P  A)(P  B))) 

(POS(AND(P  AXNOTff*  B)))) 

(POS(AND(NOT(P  A)XP  B))) 

(POS(AND(NOT(P  A)XNOT(P  B)))) ) 


The  above  possibility  axiom  involves  a  number  of  noncontingent  symbols  such  as  names  of  expressions  '(P  A)  '(P 
B)  'A  'B,  the  recursive  propositional  function  GMEANING,  the  set  of  GENERATORS,  and  the  second  order  logic 
notion  of  application(i.e.  a  set  theoretic  concept  of  eiementhood)  and  LAMBDA  abstraction.  These  concepts  must  be 
logically  true  because  they  must  be  the  same  for  every  world.  For  example  it  would  make  no  sense  to  say  that  '(P  A) 
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means  one  thing  in  one  world  and  something  else  in  another  world,  because  its  meaning  is  independent  of  the  world 
in  which  the  expression  is  uttered.  Whether  the  meaning  (P  A)  of  the  expression  ’(P  A)  is  true  or  false  in  a  world  w ill 
of  course  depend  on  that  world.  For  example  the  meaning  of '(P  A)  is  true  in  the  world:  (AND(P  A)(P  B)...)  and 
false  in  the  world  (AND(NOT(P  A)XP  B)...).  But  the  fact  that  (GMEANING  *(P  A))  equals  (P  A)  is  logically  true 
Thus  the  symbols  of  our  logic  arc  divided  into  two  groups.  The  contingent  symbols  whose  names  are  allowed  to  oc¬ 
cur  in  the  set  of  GENERATORS  and  the  noncontingent  symbolsfi.e.  the  symbols  of  classical  extcnsional  logic  con¬ 
sisting  of  LT,  symbols  defined  in  terms  of  LT,  syntax  symbols,  GMEANING,  and  GENERATORS)  whose  names 
are  not  allowed  to  occur  in  any  name  in  the  set  of  GENERATORS.  (Of  course  these  restrictions  do  not  preclude  us 
from  axiomatizing  for  example  a  contingent  set  theory  within  a  given  world  but  that  contingent  set  theory  has  noth¬ 
ing  to  do  with  the  sets  of  the  noncontingent  second  order  logic  and  in  fact  will  be  expressed  with  new  contingent 
symbols. 

Although  we  have  not  done  so  for  reasons  of  presentation,  it  should  be  noted  that  the  recursively  defined 
GMEANING  concept  could  have  been  explicitly  defined  in  the  modal  logic  Z  using  the  well  known  method  |Erege] 
of  explicitly  defining  recursive  functiions  in  second  order  logic.  It  is  for  this  reason  we  say  that  this  modal  logic  Z  is 
a  logically  true  theory.  Alternatively,  it  should  be  noted  that  if  one  disallows  the  nesting  of  contingent  function 
symbols  except  for  constants  (as  is  often  down  in  the  (extensional)  first  order  quantificational  logic)  then  the  recur¬ 
sion  inherent  in  the  GMEANING  definition  can  be  eliminated,  thus  making  this  definition  explicit  anyway. 

If  the  set  of  GENERATORS  is  finite  then  the  possibility  axiom  reduces,  in  a  manner  similar  to  the  above  deriva¬ 
tion,  to  a  conjunction  of  sentences  stating  that  any  conjunction  of  simple  sentences  or  their  negations  is  possible, 
and  this  resulting  sentence  is  entirely  expressed  within  the  modal  logic  Z  based  on  an  underlying  (extensional)  first 
order  quantificational  logic.  However,  it  is  important  to  note  that  finiteness  of  the  generator  set  is  not  required  by 
our  modal  logic  and  that  the  possibilty  axiom  AS  will  provide  the  necessary  possibilities  as  theorems  for  any  contin¬ 
gent  language.  For  example,  the  fact  that  the  conjunction  of  the  P  of  all  natural  numbers  is  possible  can  be  derived 
as  follows  from  the  infinite  generator  set  consisting  of  all  simple  sentences  of  the  form  ‘(P  ,N)  where  N  is  a  numer¬ 
al.  (Syntactic  verbs  such  as  NUMERAL  herein  are  of  course  not  contingent  symbols.) 

(GENERATORS)  -df  (LAMBDA(X)(EX  N(AND(NUMERAL  NXEQUAL  X(P  N))))) 

(GMEANING  ‘(P  ,N))  -  (P  (GMEANING  N)) 

(GMEANING  *(ADD1  ,N)  -  (ADD1  (GMEANING  N))) 

(GMEANING  *1)-  1 

(ALL  S(POS(meaning  of  the  generator  subset  S))) 

(ALL  S(POS(ALL  G(1MPLY  (GENERATORS  G) 

(IFF(S  GXGMEANING  G)) )))) 

(ALL  S(POS(ALL  G(IMPLY((LAMBDA(X)(EX  N(AND(NUMERAL  NXEQUAL  X  ‘(P  ,N)))))G) 

(IFF(S  GXGMEANING  G)) )))) 

(ALL  S(POS(ALL  G(IMPLY(EX  N(AND(NUMERAL  NXEQUAL  G  ‘(P  ,N)))) 

(IFF(S  GXGMEANING  G)) )))) 

(ALL  S(POS( ALL  G( ALL  N(IMPLY(AND(NUMERAL  N)(EQUAL  G  *(P  .N))) 

(1FF(S  GXGMEANING  G)) ))))) 

(ALL  S(POS(ALL  N(IMPLY(NUMERAL  N) 

(1FF(S  *(P  ,N))(GMEANING  *(P  ,N))) )))) 

(ALL  S(POS(ALL  N(IMPLY(NUMERAL  N) 

(IFF(S  *(P  ,N))(P(GMEANING  N))) )))) 

Thus  if  S  is  the  universe  it  follows  that: 

(POS(ALL  N(IMPLY(NUMERAL  N) 

(IFF(UNI VERSE  *(P  ,N))(P(GMEANING  N))) )))  . 

(POS(ALL  N(IMPLY(NUMER  AL  N)  (IFF  T(P(GMEAN!NG  N))) ))) 

(POS(ALL  N(IMPLY(NUMERAL  N)  (P(GMEANING  N))) )) 
which  intuitively  is: 

(POS(AND(P  1)...)) 


In  his  1696  paper  "On  the  Principle  of  Indiscemibles’  Leibniz  wrote:  "For  all  things  which  are  different  must  be  dis¬ 
tinguished  in  some  way".  Thus  we  say  that  two  things  are  equal  iff  every  property  which  is  the  meaning  of  an  ex¬ 
pression  constructed  from  contingent  function  symbols  and  which  holds  for  the  first  thing  also  holds  for  the  second 
thing: 

(EQUAL  X  Y)  -df  (ALL  p  which  are  contingent(IMPLY(p  XXp  Y)))  ;equal 
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For  example.  If  the  contingent  GENERATORS  are  {’(P  A)  '(P  B)} 
then  this  definition  scheme  could  be  rendered: 

(EQUAL  X  Y)  -df  (AND(JMPLY(P  X)(P  Y)) 

(IMPLY(NOT(P  X))(NOT(P  Y))) ) 

We  can  then  say  that  two  things  are  equal  in  a  world  W  iff  the  fact  that  they  are  equal  is  true  in  W: 

(EQUAL-IN  W  X  Y)  -df  (IS-TRUE-IN  W(EQU AL  X  Y))  ;equal  in  a  world 
The  concept  of  being  equal  in  a  world  should  be  clearly  distinguished  from  the  concept  of  being  equal  in  all  worlds: 

(LT-EQUAL  X  Y)  -df  (LT(EQUAL  X  Y))  ;equal  in  all  worlds 
as  the  confusion  between  these  concepts  seems  to  be  one  of  the  enduring  themes  in  philosophical  logic.  For 
example,  by  these  definitions  all  the  following  statements  are  true: 

(EQUAL-IN(AND(P  A)(P  B))  A  B) 

(NOT(EQUAL-IN(AND(P  A)(NOT(P  B)))  A  B)) 

(NOT(EQUAL-lN(AND(NOT(P  A))(P  B))  A  B)) 

(EQUAL-IN(AND(NOT(P  A))(NOT(P  B»)  A  B) 

(NOT(LT-EQUAL  A  B)) 

A  simple  concept  of  the  transworld  identity  of  two  objects  can  then  be  expressed  by  saying  that  for  every  contingent 
property  p  which  is  an  essential  property  of  objects  if  the  first  object  has  that  property  in  the  first  world  then  the 
second  object  has  that  property  in  the  second  w  orld: 

(SAME  X  W1  Y  W2)  -df 

(ALL  p  which  are  contingent  and  essential 

(IMPLY(TRUE-IN  Wl(p  A))(TRUE-IN  W2(p  B)))) 

For  example,  If  the  contingent  GENERATORS  are  {’(P  A)  '(P  B)}  and  if  P  is  a  property  which  is  essential  to  the 
description  of  an  object  then  this  definition  scheme  could  be  rendered: 

(SAME  XW1  YW2)  -df  (AND(TRUE-IN  W1(IMPLY(P  X)(P  Y))) 

(TRUE-IN  W2(IMPLY(NOT(P  X))(NOT(P  Y)))) ) 
and  the  following  proposition  can  be  seen  to  be  true: 

(SAME  A(AND(P  A)(NOT(P  B)))B(AND(NOT(P  A)XP  B)» 

-(AND(IMPLY  (TRUE-IN  ( AND(P  AXNOT(P  B)))(P  A)) 

(TRUE-IN(AND(NOT(P  A)XP  B))(P  B}» 

(IMPLY(TRUE-IN(AND(P  A)(NOT(P  B)))(NOT(P  A))) 

(TRUE-1N(AND(N0T(P  A))(P  B))(NOT(P  B))» ) 

-T 


The  value  of  the  Modal  Logic  Z  is  that  it  models  our  commonsense  reasoning  more  directly  than  does  classical 
logic  in  that,  in  addition  to  the  propositional  objects  NIL  and  T,  it  allows  the  use  of  ideal  propositional  objects 
[Hilbert]  which  can  be  used  as  objects  of  various  kinds  of  reasoning;  for  example,  as  objects  of  belief,  as  objects  of 
knowledge,  or  as  objects  of  obligation.  For  example,  the  commonsense  notion  that  a  robot  believes(or  at  least  that 
the  robot  should  believe)  that  which  is  entailed  by  its  beliefs  and  that  the  robot  can  conceive  that  which  is  not  con¬ 
tradicted  by  its  beliefs  can  be  directly  defined  by  explicit  definitions  of  the  modal  logic  Z  as  follows: 

(BELIEVES  ROBOT  P)  -df  (ENTAIL(BEL!EFS  ROBOT)P)) 

(CONCEIVABLE  ROBOT  P)  -df  (NOT(BELIEVES  ROBOT(NOT  P))) 

(BELIEFS  ROBOT)  -df  (the  conjunction  of  contingent  propositions  believed  by  the  robot) 

.the  commonsense  notion  that  a  Robot  knows  that  which  is  a  true  belief  can  be  directly  defined  with  the  explicit 
definition: 

(KNOW  ROBOT  P)  -df  (AND  P(BELIEVES  ROBOT  P)) 

.and  the  commonsense  notion  that  a  Robot  must  do  that  which  is  entailed  by  its  obligations  and  may  do  that  which 
is  not  contradicted  by  its  obligations  can  be  directly  defined  with  die  explicit  definitions: 

(MUST  ROBOT  P)  -df  (ENTA1L(0BL1GAT10NS  ROBOT)P) 

(MAY  ROBOT  P)  -df  (NOT(MUST(NOT  P))) 

(OBLIGATIONS  ROBOT)  -df  (the  conjunction  of  contingent  propositions  which  are  obligations  of  the  robot) 
Thus  we  see  that  the  basic  concepts  of  doxastic  k>gic(the  logic  of  belief),  epistemic  logic(the  logic  of  knowledge), 
and  deontic  logic  (the  logic  of  ethics)  can  be  explicitly  defined  in  a  commonsense  manner  which  precisely  models  our 
intuitive  understanding  of  these  concepts.  This  commonsense  approach  to  specifying  the  properties  of  intentional 
concepts  is  an  amazing  contrast  to  the  unintuitiveness  of  previous  methods  of  specifying  them  by  [Kripke]'s  exten¬ 
sion  of  traditional  semantic  methods[Tarski].  Just  lo  pick  one  example  of  the  consequences  of  using  such  unintuitive 
methods,  consider  the  otherwise  acceptable  paper[Moorel  ]  where  the  concept  of  knowledge  b  (incorrectly)  specified 
by  the  Kripke  relation  to  be  an  SS  modal  logic. 

The  consistency  of  the  modal  logic  Z  relative  to  complete  atomic  Boolean  algebras  follows  by  interpreting  LT  as 
the  Boolean  function  which  maps  every  proposition  except  T  into  NIL  The  modal  quantificational  Logic  7  is  de- 
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scribed  in  greater  detail  in[Brownl  43,4].  We  now  use  Z  to  develop  a  commonscnse  theory  of  nonmonotonicity  in 
the  following  section. 


3.  THE  REFLEXIVE  NONMONOTONIC  THEORY 

One  of  the  most  striking  features  of  nonmonotonic  knowledgebases  is  that  they  are  sometimes  described  in  terms 
of  themselves.  Such  knowledgebases  are  said  to  be  reflexive[Hayes2].  For  example,  the  knowledge  base  K  purport¬ 
edly  defined  by  the  axiom: 

(SYN  K  (IMPLY(POS(AND  K  A))A) ) 

is  defined  as  being  synonymous  to  the  default:  (IMPLY(POS(AND  K  A))A)  which  in  turn  is  defined  in  terms  of  K. 
Thus  this  purported  definition  of  K  is  not  actually  a  definition  at  all  but  is  merely  an  axiom  describing  the  properties 
possessed  by  any  knowledgebase  K  satisfying  this  axiom.  In  general,  a  purported  definition  of  a  knowledgebase: 
(SYNK(fK)) 

will  be  implied  by  zero  or  more  explicit  definitions  of  the  form: 

(SYN  Kg) 

where  K  does  not  occur  in  g.  The  explicit  definitions  which  imply  a  purported  definition  of  a  knowledgebase  are  call¬ 
ed  the  solutions  of  that  purported  definition.  In  general  a  purported  definition  may  have  zero  or  more  solutions.  For 
example,  (SYN  K(NOT  K))  is  (LT(IFF  K(NOT  K)»  which  is  (LT  NIL)  which  is  NIL  and  therefore  has  no  solutions, 
and  (SYN  K  K)  is  (LT(IFF  K  K))  which  is  (LT  T)  which  is  T  and  therefore  has  all  solutions.  Finally,  (SYN  K  G) 
where  K  does  not  occur  in  G  is  an  explicit  definition  of  K  and  therefore  has  only  one  solution  namely  itself. 

Because  K  is  the  knowledgebase  under  discussion,  it  is  not  itself  a  contingent  proposition  of  that  knowledge¬ 
base.  Thus  K  is  not  a  GENERATOR  and  the  possibility  axiom  AS  of  section  2  will  not  apply  to  it.  This  is  verifi¬ 
ed  by  the  above  example  (SYN  K  NIL)  where  K  consists  of  the  false  expression,  and  thus  is  not  possible. 

As  a  more  complex  example,  consider  the  knowledgebase  K  which  implements  an  SR  flipfiop  as  two  Boolean 
WOR  gates  connected  together  in  the  following  manner. 

;  S — >|\ 

;  IIO--+ 

+->1/  l 

i  +*( - + 

;  II 

:  I  + - + 

;  ♦ — >|\  | 

I  1 10— K 

;  R . >V 

Memory  circuits  such  as  this  SR  flipfiop  may,  unlike  nonmemory  circuits,  involve  a  self-reference  to  themselves. 
Thus  their  defining  equations  do  not  constitute  explicit  definitions.  This  memory  circuit  can  quite  easily  be  repre¬ 
sented  in  our  logic  by  letting  the  output  K  be  synonymous  with  the  expression  resulting  from  tracing  the  K  wire 
backward  thru  the  NOR  gates: 

(SYN  K(NOR  R(NOR  S  K))) 

Memory  circuits  like  any  other  circuits  satisfy  various  properties.  For  example,  if  the  reset  line  R  is  false  and  the 
set  line  S  is  true  then  there  is  one  solution  for  K,  namely  that  K  is  true.  Furthermore,  if  the  reset  line  R  is  true  there 
is  also  only  one  solution  for  K,  namely  that  K  is  false.  However,  in  any  other  case  this  purported  definition  is  impli¬ 
ed  by  any  solution.  These  simple  facts  can  indeed  be  derived  from  the  logic  representation  of  this  circuit  as  follows. 

The  SR  flipfiop  theorem: 

(EFF(SYN  K(NOR  R(NOR  S  K))) 

(LT(AND(IMPLY  R(NOT  K)XlMPLY(AND(NOT  R)S)K))) ) 
proof 

(SYN  K(NOT(OR  R(NOT(OR  S  K)))» 

(SYN  K(AND(NOT  RXOR  S  K») 

(LTflFF  K(ANO(NOT  RXOR  S  K)))) 

(LT(IFF  K(IF  R  NIL(IF  S  T  K)))) 

(LT(IF  R(IFF  K  NIL)(1F  S(!FF  K  TXIFF  K  K)))) 

(LT(IF  R(NOT  KX1F  S  K  T))) 

(LT(AND(1MPLY  R(NOT  K)XIMPLY(AND(NOT  R)S)K))) 


In  order  to  make  use  of  a  knowledgebase  it  is  helpful  to  know  what  is  actually  in  that  knowledgebase.  For  a  non- 
reflexive  knowledgebase  (Le.  a  knowledgebase  defined  by  an  explicit  definition)  this  is  no  problem  because  there  is 
obviously  only  one  solution,  namely  that  explicit  definition  itself.  However,  in  the  more  general  case  of  a  purported 
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definition  there  may  be  any  number  of  solutions.  Thus  the  basic  goal  of  a  theory  of  nonmonotonicity  of  reflexive 
knowledgebases  must  be  to  describe  the  solutions  for  various  kinds  of  purported  definitions. 

The  first  kind  of  purported  definition  we  consider  is  a  knowledgebase  K  consisting  of  (a  conjunction  of)axioms  G 
not  containing  K  plus  one  additional  standard  default  axiom.  A  standard  default  axiom  is  an  axiom  of  the  form: 

(IMPLY (POS( AND  K  A))(IMPLY  B  A)) 

This  structure  contains  as  instances  default  axioms  such  as: 

(IMPLY(POS(AND  K(CAN-FLY  ENTERPRISE))) 

(IMPLY(IS-SPACE-SCHUTTLE  ENTERPRISEXCAN-FLY  ENTERPRISE))) 

Tl:  A  knowledgebase  containing  exactly  one  variable-free  standard  default  has  precisely  one  solution. 

(IFF  (SYN  K(AND  G(IMPLY(POS(AND  K  A))(IMPLY  B  A)))) 

(SYN  K(AND  G(IMPLY(POS( AND  G  A))(1MPLY  B  A)))) 


proof 

(SYN  K(AND  G(IMPLY (POS( AND  K  A))(IMPLY  B  A)))) 

(IF  (POS(ANDKA)) 

(SYN  K(AND  G(IMPLY(AND  B  T)A))) 

(SYN  K(AND  G(IMPLY(AND  B  NDL)A))) ) 

(IF  (POS(ANDKA)) 

(SYN  K(ANDG(IMPLY  B  A))) 

(SYN  K  G)) 

(OR  (AND(POS( AND  K  A))(SYN  K(AND  G(IMPLY  B  A)))) 
(AND(NOT(POS(AND  K  A)))(SYN  K  G)) ) 

(OR  (AND(POS(ANDG(IMPLY  B  A)A)XSYN  K(ANDG(IMPLY  B  A)))) 
(AND(NOT(POS(ANDG  A)))(SYN  KG))) 

(OR  (AND(POS(AND  G  A))(SYN  K(AND  G(IMPLY  B  A)))) 
(AND(NOT(POS(AND  G  A)))(SYN  K  G)) ) 

(IF  (POS(ANDGA)) 

(SYN  K(AND  G(IMPLY  B  A))) 

(SYN  KG)) 

(SYN  K(IF(POS(AND  G  A))(AND  G(IMPLY  B  A))G)) 

(SYN  K(AND  G(IF(POS(AND  G  A)XIMPLY  B  A)T))) 

(SYN  K(AND  G(IMPLY(POS(AND  G  A)XIMPLY  B  A)))) 


The  solutions  to  the  two  purported  definitions  discussed  in  section  1  are  obtained  from  theorem  Tl  as  corollaries  for 
if  G  is  T,  B  is  T,  and  A  is  possible  it  follows  that: 

(IFF(SYN  K(IMPLY(POS(AND  K  A))A)) 

(SYN  K  A)) 

and  if  G  is  (NOT  A)  and  B  is  T  it  follows  that: 

(IFF(SYN  K(AND(NOT  AXIMPLY(POS(AND  K  A))A))) 

(SYN  K(NOT  A))) 

Tl  shows  that  a  knowledgebase  with  only  one  variable-free  standard  default,  has  the  same  essential  status  as  an  ex¬ 
plicit  definition.  The  next  theorem:  T2  shows  that  this  is  not  the  case  for  a  knowledgebase  with  2  variable-free  stand¬ 
ard  defruits. 

T2:  A  knowledgebase  consisting  of  two  variable-free  standard  defaults  has  precisely  one  or  two  solutions. 

(IFF (SYN  K(AND  G  ( IMPLY (POS (AND  K  Al)) (IMPLY  B1  Al)) 

< IMPLY (POS (AND  K  A2)) (IMPLY  B2  A2))  )) 

(IF (POS (AND  G( IMPLY  B2  A2)A1)) 

(IF (POS (AND  G( IMPLY  B1  AI)A2)) 

(SYN  K(AND  G( IMPLY  B1  Al) (IMPLY  B2  A2))) 

(SYN  K (AND  G< IMPLY  B1  Al)))) 

(IF  (POS (AND  G( IMPLY  B1  A1)A2)) 

(SYN  K(AND  G ( IMPLY  B2  A2) ) ) 

(IF (POS (AND  G  Al)) 

(IF (POS (AND  G  A2) ) 

(OR  (SYN  K (AND  G ( IMPLY  B1  Al))) 
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(SYN  K (AND  G (IMPLY  B2  A2)))) 
(SYN  K (AND  G (IMPLY  B1  A1 ) ) ) ) 

(IF (POS (AND  G  A2) ) 

(SYN  K (AND  G (IMPLY  B2  A2))) 

(SYN  KG))  )))) 


proof 

(SYN  K(AND  G( IMPLY  (POS  (AND  K  Al))  (IMPLY  B1  Al)) 

(IMPLY (POS (AND  K  A2)) (IMPLY  B2  A2))  )) 


(IF (POS (AND  K  Al)) 

(IF (POS (AND  K  A2) ) 

(SYN  K (AND  G ( IMPLY  T ( IMPLY  B1  Al))  (IMPLY  T ( IMPLY  B2  A2) ) )  ) 

(SYN  K (AND  G ( IMPLY  T (IMPLY  B1  Al)) (IMPLY  NIL ( IMPLY  B2  A2))))  ) 

(IF (POS (AND  K  A2) ) 

(SYN  K (AND  G( IMPLY  NIL(IMPLY  B1  Al)) (IMPLY  T ( IMPLY  B2  A2)))) 

(SYN  K (AND  G (IMPLY  NIL ( IMPLY  B1  Al)) (IMPLY  NIL ( IMPLY  B2  A2 ) ) ) )  )) 


(IF (POS (AND  K  Al)) 

(IF (POS (AND  K  A2) ) 

(SYN  K (AND  G (IMPLY  B1  Al) (IMPLY  B2  A2))) 
(SYN  K (AND  G (IMPLY  B1  Al) ) )  ) 

(IF (POS (AND  K  A2) ) 

(SYN  K (AND  G (IMPLY  B2  A2) ) ) 

(SYN  KG))) 


(OR  (AND  (POS  (AND  K  Al))  (POS  (AND  K  A2 )  ) 

(SYN  K (AND  G (IMPLY  B1  Al)  (IMPLY  B2  A2)))) 

(AND (POS (AND  K  Al ) )  (NOT (POS < AND  K  A2)))(SYN  K (AND  G (IMPLY  B1  Al)))) 
(AND  (NOT  (POS  (AND  K  Al )  )  )  (POS  (AND  K  A2)  )  (SYN  K  (AND  G  (IMPLY  B2  A2)  )  )  ) 
(AND (NOT  (POS  (AND  K  Al) ) )  (NOT  (POS  (AND  K  A2) ) )  (SYN  KG))) 

(OR (AND (POS  (AND  G( IMPLY  B1  Al)  (IMPLY  B2  A2)A1)) 

(POS (AND  G (IMPLY  B1  Al) (IMPLY  B2  A2)A2)) 

(SYN  K (AND  G (IMPLY  B1  Al) (IMPLY  B2  A2)>>) 

(AND (POS (AND  G( IMPLY  B1  A1)A1)) 

(NOT (POS  (AND  G  ( IMF LY  B1  A1)A2>)> 

(SYN  K (AND  G (IMPLY  B1  Al)))) 

(AND  (NOT (POS  (AND  G  (IMPLY  B2  A2)A1))) 

(POS (AND  G( IMPLY  B2  A2)A2)) 

(SYN  K(AND  G (IMPLY  B2  A2)))) 

(AND  (NOT  (POS  (AND  G  Al) ) )  (NOT  (POS  (AND  G  A2)))(SYN  KG))) 

(OR (AND (POS (AND  G( IMPLY  B2  A2)A1)) 

(POS (AND  G< IMPLY  B1  A1)A2)) 

(SYN  K (AND  G (IMPLY  B1  Al)  (IMPLY  B2  A2) ) ) ) 

(AND (POS (AND  G  Al)) 

(NOT (POS (AND  G( IMPLY  B1  A1)A2))) 

(SYN  K(AND  G (IMPLY  B1  Al)))) 

(AND  (NOT  (POS  (AND  G(  IMPLY  B2  A2)A1))) 

(POS (AND  G  A2) ) 

(SYN  K(AND  G (IMPLY  B2  A2)))) 

(AND (NOT (POS  (AND  C.  Al) ) )  (NOT  (POS  (AND  G  A2) ) )  (SYN  KG))) 

(IF (POS (AND  G( IMPLY  B2  A2)A1)) 

(IF (POS (AND  G( IMPLY  B1  A1)A2)) 

(OR (SYN  K (AND  G (IMPLY  B1  Al)  (IMPLY  B?  A2) ) ) 

(AND (NOT (POS  (AND  G  Al) ) )  (NOT (POS  (AND  G  A2)))(SYN  KG))) 

(OR (AND (POS (AND  G  Al)) 

(SYN  K(AND  G ( IMPLY  B1  Al)))) 
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(AND  (NOT  (POS  (AND  G  Al) )  (NOT  (POS  (AND  G  A2)))(SYN  KG))) 

( IF (POS  G(  IMPLY  B1  A1)A2) 

(OR  (AND  (POS  (AND  G  A2)  ) 

(SYN  K (AND  G (IMPLY  B2  A2)))) 

(AND  (NOT  (POS  (AND  G  Al ) ) )  (NOT  (POS  (AND  G  A2) ) )  (SYN  KG))) 

(OR  (AND  (POS  (AND  G  Al)) 

(SYN  K (AND  G( IMPLY  B1  Al)))) 

(AND (POS (AND  G  A2) ) 

(SYN  K (AND  G( IMPLY  B2  A2)))) 

(AND  (NOT  (POS  (AND  G  Al)))  (NOT  (POS  (AND  G  A2) ) )  (SYN  KG))) 

(IF (POS  G (IMPLY  B2  A2)A1) 

(IF (POS (AND  G (IMPLY  B1  A1)A2)) 

(SYN  K (AND  G (IMPLY  B1  Al)  (IMPLY  B2  A2))) 

(SYN  K (AND  G (IMPLY  B1  Al)))) 

(IF (POS (AND  G (IMPLY  B1  A1)A2)) 

(SYN  K (AND  G (IMPLY  B2  A2))) 

(OR (AND  (POS (AND  G  Al)) 

(SYN  K (AND  G( IMPLY  B1  Al)))) 

(AND (POS (AND  G  A2)  ) 

(SYN  K (AND  G (IMPLY  B2  A2)))) 

(AND  (NOT  (POS  (AND  G  Al)))  (NOT  (POS  (AND  G  A2)))  (SYN  KG))  ))) 

(IF (POS (AND  G( IMPLY  B2  A2)A1)) 

(IF (POS (AND  G( IMPLY  B1  Al) A2) ) 

(SYN  K  (AND  G  (IMPLY  B1  Al)  (IMPLY  B2  A2))) 

(SYN  K (AND  G( IMPLY  B1  Al)))) 

(IF  (POS (AND  G( IMPLY  B1  A1)A2)) 

(SYN  K (AND  G (IMPLY  B2  A2) ) ) 

(IF (POS (AND  G  Al) ) 

(IF (POS (AND  G  A2) ) 

(OR (SYN  K (AND  G( IMPLY  B1  Al))) 

(SYN  K (AND  G (IMPLY  B2  A2)))) 

(SYN  K (AND  G( IMPLY  B1  Al)))) 

(IF (POS (AND  G  A2) ) 

(SYN  K  (AND  G( IMPLY  B2  A2))) 

(SYN  K  G) ) ) ) ) 

The  Alternatives  Corollary  to  T2: 

If  G  is  (OR  B1  B2)  and  (AND  Al  A2  Bl)  is  possible  then  T2  reduces  to  the  proposition  that: 

(IFF(SYN  K(AND(OR  Bl  B2)(lMPLY(POS(AND  K  A1)XIMPLY  Bl  Al)) 

(IMPLY(POS(AND  K  A2))(IMPLY  B2  A2)))) 

(SYN  K(AND(OR  Bl  B2)(!MPLY  Bl  A1)(1MPLY  B2  A2))) ) 

Thus  K  entails  (OR  Al  A2).  This  illustrates  the  importance  of  treating  defaults  as  expressions  rather  than  as  infer¬ 
ence  rules  because  inference  rules  such  as: 

(from  a  deduction  of  (POS(AND  K  Al))  and  a  deduction  of  Bl  infer  Al) 

(from  a  deduction  of  (POS(AND  K  A2))  and  a  deduction  of  B2  infer  A2) 
would  not  allow  any  deduction  to  be  made  since  neither  B 1  nor  B2  is  a  theorem  of  the  knowledgebase. 

The  Flipped  Coin  Corollary  to  T2: 

If  Al  is  A,  A2  is  (NOT  A),  Bl  is  T,and  B2  is  T  then  T2  reduces  to  the  proposition  that: 

(IFF(SYN  K(AND  G(IMPLY(POS(AND  K  A))AXIMPLY(POS(AND  K(NOT  A))XNOT  A)))) 

(IF(POS(AND  G  A)) 

(IF(POS(AND  G(NOT  A))) 

(OR(SYN  K(AND  G  A))(SYN  K(AND  G(NOT  A)))) 

(SYNK(ANDGA))) 

(IF(POS(AND  G(NOT  A)))  (SYN  K(AND  G(NOT  A)))  (SYN  K  G)) ) 

If  (AND  G  A)  is  possible  and  (AND  G(NOT  A))  is  possible  then  this  proposition  reduces  to: 

(IFF(SYN  K(AND  G(IMPLY(POS( AND  K  A))AXIMPLY(POS(  AND  K(NOT  A))XNOT  A)))) 

(OR(SYN  K(AND  G  A)XSYN  K(AND  G(NOT  A))))  ) 
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which  states  the  K  has  precisely  two  solutions.  Furthermore,  if  G  is  T,  then  these  two  solutions  are  direct  opposites 
in  that  one  says  the  knowledgebase  is  A  and  the  other  says  the  knowledgebase  is  (NOT  A): 

(IFF(SYN  K( AND(IMPLY (POS( AND  K  A))A)(IMPLY(POS(AND  K(NOT  A)))(NOT  A)))) 

(OR(SYN  K  A)(SYN  K(NOT  A))) 

There  is  nothing  at  all  bizarre  about  having  multiple  solutions  or  even  opposite  multiple  solutions  for  one  can 
easily  imagine  a  robot  executing  actions  in  a  given  state  resulting  in  a  new  state  K  which  must  be  one  of  the  above 
solutions,  although  there  is  no  way  the  robot  in  its  planning  can  determine  which  solution  for  K  is  actually  the  case. 
For  example  let  A  be  the  proposition  that  a  flipped  coin  will  land  with  heads.  The  default:  (IMPLY(POS(  AND  K 
A))A)  then  means  that  if  it  is  possible  for  the  coin  to  land  on  heads  then  assume  it  does  so.  Likewise,  the  default 
(lMPLY(POS(AND  K(NOT  A)))(NOT  A))  means  that  if  it  is  possible  for  a  coin  to  land  tails(i.e.  not  heads )then 
assume  it  does  so.  The  result  of  the  action  is  then  one  of  two  states  K: 

(OR(SYN  K  A)(SYN  K(NOT  A))) 

where  the  coin  landed  heads  and  where  the  coin  landed  tails  (i.e.  not  heads).  It  should  be  noted  that  a  disjunction  of 
solutions  is  altogether  different  from  a  solution  which  is  a  disjunction  of  alternatives  such  as:(SYN  K(OR  A(NOT 
A)))  which  in  this  case  is  equivalent  to:(SYN  K  T)  and  which  would  be  an  incorrect  rendering  of  what  is  intuitively 
meant  by  multiple  defaults. 

In  planning  further  actions  to  the  resulting  state  K  in  order  to  achieve  some  overall  goal  the  robot  must  take  into 
account  all  the  different  solutions  for  K  and  make  its  plans  accordingly.  For  example,  if  the  robots  overall  goal  is  to 
flip  the  coin  until  it  lands  heads  then  the  robot  should  plan  to  do  nothing  for  the  solution:  (SYN  K  A)  ,but  should 
plan  to  continue  flipping  the  coin  for  the  solution  (SYN  K(NOT  A)). 

The  purpose  of  using  these  default  axioms  is  to  allow  for  the  case  of  where  additional  information  in  G  contra¬ 
dicts  the  defaults.  For  example  if  the  coin  has  tails  on  both  sides  then  the  flipped  coin  will  always  land  tails.  Thus 
letting  G  be  (NOT  A)  and  assuming  that  (NOT  A)  is  logically  possible,  the  first  corollary  expression  of  T2  above 
reduces  to  the  single  solution: 

(IFF(SYN  K(AND(NOT  A)(IMPLY(POS(AND  K  A))AKEMPLY(POS(AND  K(NOT  A)))(NOT  A)))) 

(SYN  K(NOTA))) 

Likewise  if  the  coin  has  heads  on  both  sides  then  the  flipped  coin  will  always  land  heads.  Thus  letting  G  be  A  and 
assuming  that  A  is  logically  possible,  the  first  corollary  expression  of  T2  above  reduces  to  the  single  solution: 
(1FF(SYN  K(AND(NOT  A)(IMPLY(POS(AND  K  A))A)(IMPLY(POS(AND  K(NOT  A)))A))) 

(SYNKA)) 

The  Closed  World  Assumption  Corollary  to  T2: 

Assume  that  the  knowledgebase  consists  of  the  fact  that  (OR(NOT  A1  KNOT  A2))  and  two  standard  defaults  im¬ 
plementing  the  closed  world  assumption  [Reiter  1]  that  the  meaning  of  any  simple  sentence  which  is  possible  with 
respect  to  what  is  assumed  is  the  case.  If  B1  is  T,  B2  is  T,  and  G  is  (OR(NOT  A 1  KNOT  A2))  in  T2  and  if  ’A1  and 
'A2  are  GENERATORS  then  T2  reduces  to: 

(IFF(SYN  K(AND(OR(NOT  A1KNOT  A2)) 

(IMPLY (POS( AND  K  A1))A1)(1MPLY(P0S(AND  K  A2))A2) )) 

(OR(SYN  K(AND(NOT  A2)A1))(SYN  K(AND(NOT  A1)A2))) ) 

This  corollary  illustrates  the  fact  that  defaults  with  mutually  exclusive  conditions  even  if  they  are  not  negations  of 
each  other  do  in  fact  result  in  alternative  solutions. 

The  flipped  coin  and  closed  world  examples  can  be  generalized  to  a  knowledgebase  containing  N  mutually  exclu¬ 
sive  defaults.  Essentially,  this  is  done  by  adding  N  standard  defaults  to  the  theory  and  letting  G  state  that  the  conclu¬ 
sions  of  all  these  defaults  are  mutually  exclusive.  Such  a  knowledge  base  will  have  precisely  N  solutions  whenever 
no  other  information  is  available.  This  fact  is  proven  below  in  theorem  T3.  This  proof  illustrates  the  smooth  inter¬ 
action  of  reasoning  with  complex  mixtures  of  contingent  and  necessary  expressions  involving  quantifiers  in  the  mod¬ 
al  quantiflcational  logic  Z.  Thus,  for  example,  in  the  absence  of  any  other  information  a  knowledgebase  containing 
six  mutually  exclusive  defaults  specifying  die  world  states  after  rolling  a  six  sided  die  would  result  in  six  distinct 
solutions. 

T3:  For  all  N  there  exist  a  knowledgebase  with  N  standard  defaults  and  with  N  solutions. 

(IMPLY 

(AND(SYN  G(AND  G2(ALL  I(ALLJ(OR(EQUAL  I  JXIMPLY(P  IXNOT(P  J)))))))) 

(ALL  N(POS(P  N))) ) 

(ALL  N(IFF(SYN  K(AND  G( ALL  M(IMPLY(<-  M  N) 

(IMPLY  (POS(  AND  K(PM))XP  M)))) )) 

(EX  M(AND(<-  M  NXSYN  K(AND  G(P  M))))) ))) 
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Let  I,J,N  range  over  the  positive  integers  which  along  with  their  numeric  properties  such  as  -  and  <>  are  assumed  to 
be  necessary. 

Let  (SYN  G(  AND  G2(ALL  1(ALL  J(OR(EQUAL  I  J))(1MPLY(P  IXNOT(P  J))))))). 

Let  (ALL  N(POS(P  N)))  be  true.  Then  it  follows  that: 

(ALL  N (IFF (SYN  K(AND  G(ALL  M (IMPLY <<-  M  N)  (IMPLY (POS (AND  K (P  M) ) ) (P  M) ) ) )  )) 
(EX  M(AND  (<-  M  N)  (SYN  K(AND  G(P  M) ) ) ) )  ) ) 
proof 

(by  induction  on  N) 

base-case:  N-l 

(IFF  (SYN  K (AND  G (ALL  M(IMPLY(<-  M  1)  (IMPLY  (POS  (AND  K(P  M) ) )  (P  M) ) ) )  )) 

(EX  M(AND<<-  M  1)  (SYN  K  (AND  G(P  M) ) ) ) }  ) 

(IFF (SYN  K (AND  G (IMPLY (POS (AND  K(P  1) ) ) (P  l)))) 

(SYN  K (AND  G (P  1)  ) )  ) 

(IFF  (SYN  K  (AND  G  (IMPLY  (POS  (AND  K(P  1 ) ) )  (P  1)))) 

(SYN  K (AND  G (P  1)  ) )  ) 

(IFF (SYN  K (AND  G( IMPLY (POS (AND  G(P  1)))(P  1))))  ;by  T1 
(SYN  K (AND  G(P  1) ) )  ) 

(IFF (SYN  K (AND  G (IMPLY  T (P  I))))  ;  since  (AND  G(P  1) ) is  possible 
(SYN  K  (AND  G (P  1) ) )  ) 

T 


induction-step  N+l  if  N 
(ALL  N (IMPLY 

(IFF  (SYN  K  (AND  G  (ALL  M( IMPLY  (<-  M  N)  (IMPLY  (POS  (AND  K  (P  M)  )  )  (P  M) ) )  )  )) 

(EX  M  (AND  (<*•  M  N)  (SYN  K  (AND  G(P  M)  ) ) )  )  )) 

(IFF  (SYN  K  (AND  G  (ALL  M(IMPLY«-  M(l+  N) )( IMPLY  (POS  (AND  K(P  M) )  )  (P  M) ) ) )  )) 
(EX  M  (AND  (<•*  M  ( 1+  N)  )  (SYN  K  (AND  G(P  M) ) ) )  )  )  ) 

(ALL  N( IMPLY 

(IFF  (SYN  K  (AND  G  (ALL  M(IMPLY«-  M  N)  ( IMPLY  (POS  (AND  K  (P  M)  )  )  (P  M) ) )  )  )) 

(EX  M(AND«-  M  N)  (SYN  K  (AND  G(P  M) ) ) ) )  )) 

(IFF  (SYN  K (AND  G  (ALL  M(IMPLY(<-  M  N)  (IMPLY  (POS  (AND  K(P  M) ) )  (P  M) ) ) ) 

(IMPLY (POS (AND  K(P(1+  N))))(P(1+  N) ) )  )) 

(OR (SYN  K (AND  G(P(1+  N) ) ) ) 

(EX  M  (AND  (<*=  M  N)  (SYN  K  (AND  G(P  M)  )  )  )  )  ))  ) 

Let  the  induction  hypothesis  be: 

H- (IFF (SYN  K (AND  G (ALL  M( IMPLY (<-  M  N)  (IMPLY (POS (AND  K (P  M) ) )  (P  M) ) ) )  )) 

(EX  M(AND(<-  M  N)  (SYN  K (AND  G(P  M) )))  I  )) 

(ALL  N (IMPLY  H  /equality  substitution  using  induction  hypothesis 

(IFFtSYN  K(AND  G (IMPLY (POS (AND  K(P(1+  N))))(P(1+  N) ) ) 

(ALL  M(  IMPLY  (<*  M  N)  ( IMPLY  (POS  (AND  K  (P  M) )  )  (P  M) ) ) )  )) 

(OR (SYN  K (AND  G(P(1+  N) ) ) ) 

(SYN  K (AND  G (ALL  M(IMPLY(<-  M  N) (IMPLY (POS (AND  K (P  M) ) )  (P  M) ) ) ) ) )  )) 

)) 

assuming  H  and  letting: 

B" (ALL  M( IMPLY (<“  M  N)  (IMPLY (POS (AND  K(P  M) ) ) (P  M) ) ) ) 
we  need  to  prove  for  all  N  that: 

(IFF (SYN  K(AND  G (IMPLY (POS (AND  K(P(1+  N)))(P(1+  N))))B)) 

(OR (SYN  K(AND  G(P(1+  N) ) ) )  (SYN  K (AND  G  B) ) )  )) 

(AND (IMPLY (SYN  K (AND  G (IMPLY (POS (AND  G  B(P(1+  N) ) )  <P(1+  N) ) ) )B) ) 

(OR (SYN  K (AND  G(P(1+  N))))(SYN  K(AND  G  B) ) )  ) 

(IMPLY (OR(SYN  K (AND  G(P(1+  N) ) ) ) (SYN  K (AND  G  B) ) ) 

(SYN  K (AND  G( IMPLY (POS (AND  G  B(P(1+  N) ) )  (P  (1+  N) ) ) ) B) )  ) ) 
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(AND (IMPLY (SYN  K (AND  G (IMPLY (POS (AND  G  B(P(1+  N)))(P(1+  N))))B)) 

(OR (SYN  K (AND  G(P(1+  N) ) ) )  (SYN  K (AND  G  B)  ) )  ) 

(IMPLY (SYN  K (AND  G(P<1+  N) ) ) ) 

(SYN  K (AND  G( IMPLY (POS (AND  G  B(P(1+  N) ) )  (P  (1+  N) ) ) ) B) )  ) 
(IMPLY (SYN  K (AND  G  B) ) ) 

(SYN  K (AND  G( IMPLY (POS (AND  G  B(P(1+  N) ) )  (P (1+  N) ) ) ) B) )  ) ) 

(AND (IMPLY (SYN  K (AND  G( IMPLY (POS (AND  G  B(P(1+  N)))(P(1+  N))))B)) 

(OR (SYN  K (AND  G(P(1+  N) ) ) ) (SYN  K (AND  G  B) ) )  ) 

(IMPLY (SYN  K (AND  G(P(1+  N) ) ) ) 

(SYN  K (AND  G (IMPLY (POS (AND  G  B(P(1+  N)  )  )  (P  (1+  N)  ) ) )  B) )  ) 
(IMPLY (SYN  K (AND  G  B) ) ) 

(SYN  K (AND  G( IMPLY (POS (AND  G  B(P(l4  N)  )  )  (P(I+  N))))B))  )) 
The  proof  of  the  induction  step  breaks  into  three  cases: 


easel: 


(IMPLY (SYN  K (AND  G (IMPL/ (POS (AND  G  B(P(1+  N) )  )  (P (1+  N) ) ) ) B) ) 

(OR (SYN  K (AND  G(P(l+  N) ) ) )  (SYN  K (AND  G  B) ) )  ) 

(IMPLY (SYN  K (AND  G (IMPLY (POS (AND  G  B(P(1+  N) ) )  (P (1+  N) ) ) ) B) ) 

(OR (SYN  K (AND  G(P(1+  N) ) ) ) ) 

(IMPLY (SYN  K (AND  G( IMPLY (POS (AND  G  B(P(1+  N) ) ) (P (1+  N))))B)) 
(SYN  K (AND  G  B) )  ) ) 

;it  remains  only  to  prove: 

(IMPLY (SYN  K (AND  G (IMPLY (POS (AND  G  B(P(1+  N) ) )  (P (1+  N) ) ) ) B) ) 

(OR (SYN  K (AND  G(P(1+  N) ) ) ) ) 

(NOT (POS (AND  G  B(P(1+  N) ) ) ) ) ) 

(IMPLY (SYN  K (AND  G (IMPL* (POS (AND  G  B(P(1+  N)))(P(1+  N) ) ) ) B) ) 

(IMPLY (POS (AND  G  B(P(1+  N) ) ) ) 

(SYN  K (AND  G (P (1+  N) ) ) )  )) 

(IMPLY (POS (AND  G  B(P(1+  N) ) ) ) 

(IMPLY  (SYN  K  (AND  G  (IMPLY  (POS  (AND  G  B  (P  (1+  N)))(P(1-*  N))))B)) 

(SYN  K  (AND  G  (P  (1+  N)  )  )  )  )) 

(IMPLY (POS (AND  G  B(P(1+  N) ) ) ) 

(IMPLY (SYN  K (AND  G (IMPLY  T(P(1+  N))))B) 

(SYN  K  (AND  G  (P  (1+  N) )  ) )  )  ) 

(IMPLY (POS (AND  G  B(P(1+  N) ) ) ) 

(IMPLY (SYN  K (AND  G(P(1+  N))B)) 

(IMPLY (SYN  K (AND  G(P(1+  N))B)) 

(SYN  K (AND  G (P  (1+  N) )  ) )  ))) 

;since  K  is  G  P(l+  N)  B  by  the  first  SYN  K  implication 
;it  follows  that  the  B  expression  in  the  second  SYN  K  implication 
;is  T  since  every  (AND  K(P  M) ) )  in  that  B  is  cotradicted  by  the  P(l-fN) 
; of  the  first  SYN  K  implication. 

(IMPLY (POS (AND  G  B(P(1+  N) ) ) ) 

(IMPLY (SYN  K (AND  G(P(1+  N) ) B) ) 

(IMPLY (SYN  K (AND  G(P(1+  N))T)) 

(SYN  K  (AND  G  (P  (1+  N) ) ) )  ))) 


T 


case2: 

(IMPLY (SYN  K(AND  G(P(1+  N) ) ) ) 

(SYN  K (AND  G( IMPLY (POS (AND  G  B(P(1+  N) ) )  (P  (1+  N) ) ) ) B) )  ) 

/since  K  is  G  and  P(l+  N)  which  implies  not  PI... not  Pn, 

; (POS (AND  K(r  M)  in  B  is  NIL  so  B  is  T. 

(IMPLY (SYN  K(AND  G(P(1+  N) ) ) ) 

(SYN  K(AND  G( IMPLY (POS (AND  G  T(P(1+  N)))(P(1+  N))))T))  ) 
(IMPLY (SYN  K(AND  G(P(1+  N) ) ) ) 

(SYN  K (AND  G(POS (AND  G(P(1+  N) ) ) (P (1+  N) ) ) ) )  ) 

; since  (AND  G(P(1+  N) ) )  is  possible 
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(IMPLY (SYN  K (AND  G(P(1+  N) ) ) ) 

(SYN  K (AND  G  T(P(1+  N) ) ) ) )  ) 
T 


case3: 

(IMPLY (SYN  K(AND  G  B) ) 

(SYN  K(AND  G ( IMPLY (POS (AND  G  B(P(1  +  N)))(P(1+  N) ) ) )  B) )  ) 

.•would  be  true  if  (POS  (AND  G  B(P(1+  N) ) ) )  were  false 
;so  it  remains  only  to  prove: 

(IMPLY (SYN  K (AND  G  B) ) 

(NOT (POS (AND  G  B(P(1+  N) ) ) ) )  ) 

;by  hypothesis 
(IMPLY (SYN  K (AND  G  B)  ) 

(NOT (POS (AND  K (P  ( 1 +  N) ) )  ) }  ) 

; using  H  aqain 

(IMPLY (EX  M (AND (<«  M  N) (SYN  K (AND  G(P  M)  ) )  ) ) 

(NOT (POS (AND  K(P(1+  N) ) ) ) )  ) 

(IMPLY  (EX  M (AND  (<«■  M  N)  (SYN  K  (AND  G(P  M) ) ) ) )  . 

(NOT (POS (AND  K(P(1+  N) ) ) ) )  ) 

(ALL  M (IMPLY (<-  M  N) 

(IMPLY (SYN  K (AND  G(P  M) ) ) 

(MOT  (POS  (AND  K  (P  (1+  N) )  )  ) )  ))) 

(ALL  M(  IMPLY  (O  M  N) 

(IMPLY (SYN  K (AND  G(P  M)  )  ) 

(NOT(POS(AND  G(P  M)(P(1+  N) ) ) ) )  ))) 

.•since  (AND  G(P  M)(P(1+  N) ) )  is  NIL 
(ALL  M< IMPLY  (<-  M  N) 

(IMPLY (SYN  K (AND  G(P  M) ) ) 

(NOT  NIL)))) 

The  Infinite  Number  Corollary  to  theorem  T3: 

There  exist  a  knowledgebase  with  an  infinite  number  of  standard  defaults  and  with  the  same  infinite  number  of 
solutions.  The  proof  is  obtained  by  letting  N  in  theorem  T3  be  an  infinite  positive  integer  such  as  omega  in  non¬ 
standard  number  theory  [Robinson], 

The  following  particular  knowledgebase  is  taken  fromfReiter]  where  it  is  claimed  that  the  analogous  formulation 
in  the  nonmonotonic  logic  of  [McDermott* Doyle]  has  no  fixed  point. 

T4:  There  exists  a  knowledgebase  consisting  of  three  standard  defaults  which  has  no  solutions: 
if  ,A1,'A2,'A3,,B1,'B2,'B3  are  all  generators  then: 

(IFF  (SYN  K  (AND  (IMPLY  (POS  (AND  K  Al))  (IMPLY  B1  Al)  (IMPLY  A1  B2)  (NOT  (AND  A1  A2)  )  ) 

(IMPLY  (POS  (AND  K  A2))  (IMPLY  B2  A2)  (IMPLY  A2  B3)  (NOT  (AND  A2  A3))) 

(IMPLY  (POS  (AND  K  A3))  (IMPLY  B3  A3)  (IMPLY  A3  Bl)  (NOT  (AND  A3 

Al))))) 

NIL) 


proof 

; ; ; let  G  be  (AND (IMPLY  Al  B2) (IMPLY  A2  B3) (IMPLY  A3  Bl) 

(NOT (AND  Al  A2))  (NOT (AND  A2  A3))  (NOT (AND  A3  Al))) 
? ; ; let  GENERATORS  include  (\A1  ',A2  \A3  \B1  \B2  \B3) 
then : 

(SYN  K (AND  G  (IMPLY(POS  K  Al) (IMPLY  Bl  Al)) 

(IMPLY  (POS  K  A2)  (IMPLY  B2  A2) ) 

(IMPLY (POS  K  A3) (IMPLY  B3  A3)))) 

(IF (POS  K  Al) 

(IF (POS  K  A2) 

(IF (POS  K  A3) 

(SYN  K (AND  G^IMPLY  Bl  Al) (IMPLY  B2  A2)  (IMPLY  B3  A3))) 
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(SYN  K (AND  G(IMFLY  B1  Al) (IMPLY  B2  A2)))) 

(IF (POS  K  A3) 

(SYN  K (AND  G (IMPLY  B1  Al) (IMPLY  B3  A3))) 

(SYN  K (AND  G< IMPLY  B1  Al))))) 

(IF  (POS  K  A2) 

(IF (POS  K  A3) 

(SYN  K (AND  G( IMPLY  B2  A2) (IMPLY  B3  A3))) 

(SYN  K(AND  G (IMPLY  B2  A2)T))) 

(IF (POS  K  A3) 

(SYN  K(AND  G  (IMPLY  B3  A3))) 

(SYN  K  G)))) 

(OR (AND (POS  K  Al) (POS  K  A2) (POS  K  A3) 

(SYN  K(AND  G( IMPLY  B1  Al) (IMPLY  B2  A2) (IMPLY  B3  A3)))) 

(AND (POS  K  Al) (POS  K  A2) (NOT(POS  K  A3)) 

(SYN  K(AND  G ( IMPLY  B1  Al) (IMPLY  B2  A2)))) 

(AND (POS  K  Al) (NOT (POS  K  A2) ) (POS  K  A3) 

(SYN  K(AND  G< IMPLY  B1  Al) (IMPLY  B3  A3)))) 

(AND (POS  K  Al) (NOT (POS  K  A2) ) (NOT (POS  K  A3)) 

(SYN  K(AND  G ( IMPLY  B1  Al)))) 

(AND (NOT (POS  K  Al) )  (POS  K  A2)  (POS  K  A3) 

(SYN  K (AND  G ( IMPLY  B2  A2) (IMPLY  B3  A3))) 

(AND  (NOT  (POS  K  Al) )  (POS  K  A2)  (NOT  (POS  K  A3)) 

(SYN  K (AND  G (IMPLY  B2  A2)))) 

(AND (NOT  (POS  K  Al)  )  (NOT (POS  K  A2))  (POS  K  A3) 

(SYN  K  (AND  G(  IMPLY  B3  A3))) 

(AND (NOT  (POS  K  Al)  )  (NOT (POS  K  A2))  (NOT (POS  K  A3)) 

(SYN  KG))))  ) 

Then  by  using  G  we  get: 

; ; ; let  G  be  (AND (IMPLY  Al  B2) (IMPLY  A2  B3) (IMPLY  A3  Bl) 

;;;  (NOT (AND  Al  A2))(NOT(AND  A2  A3))(NOT(AND  A3  Al))) 

(OR  (AND  (POS  (AND  G(  IMPLY  B2  A2)  (IMPLY  B3  A3)A1))  ;B2  A2  NIL 

(POS  (AND  G (IMPLY  Bl  Al)  (IMPLY  B3  A3)A2)>  ;B3  A3  NIL 

(POS (AND  G( IMPLY  Bl  Al) (IMPLY  B2  A2)A3))  ;B1  Al  NIL 

(SYN  K (AND  G (IMPLY  Bl  Al) (IMPLY  B2  A2) (IMPLY  B3  A3)))) 

(AND  (POS  (AND  G<  IMPLY  B2  A2)A1))  ;B2  A2  NIL 

(POS (AND  G( IMPLY  Bl  A1)A2)) 

(NOT (POS (AND  G (IMPLY  Bl  Al) (IMPLY  B2  A2)A3))) 

(SYN  K(AND  G (IMPLY  Bl  Al) (IMPLY  B2  A2)))) 

(AND (POS (AND  G( IMPLY  B3  A3) Al) ) 

(NOT (POS (AND  G( IMPLY  Bl  Al) (IMPLY  B3  A3)A2))) 

(POS  (AND  G( IMPLY  Bl  A1)A3))  ;B1  Al  NIL 

(SYN  K(AND  G( IMPLY  Bl  Al) (IMPLY  B3  A3)))) 

(AND (POS (AND  G  Al)  ) 

(NOT (POS (AND  G( IMPLY  Bl  A1)A2))) 

(NOT (POS (AND  G( IMPLY  Bl  A1)A3)))  ;B1  Al  T 

(SYN  K(AND  G (IMPLY  Bl  Al)))) 

(AND (NOT (POS (AND  G (IMPLY  B2  A2) (IMPLY  B3  A3)A1))) 

(POS  (AND  G( IMPLY  B3  A3)  A2) )  ;B3  A3  NIL 

(POS (AND  G(  IMPLY  B2  A2)A3)) 

(SYN  K(AND  G (IMPLY  B2  A2) (IMPLY  B3  A3)))) 

(AND (NOT  (POS (AND  G( IMPLY  B2  A2)A1)))  ;B2  A2  T 

(POS (AND  G  A2) ) 

(NOT (POS (AND  G< IMPLY  B2  A2)A3))) 

(SYN  K (AND  G( IMPLY  B2  A2 ) ) ) ) 

(AND (NOT (POS (AND  G( IMPLY  B3  A3)A1))) 

(NOT  (POS  (AND  G( IMPLY  B3  A3)A2)))  ;B3  A3  T 

(POS (AND  G  A3) ) 
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(SYN  K (AND  G (IMPLY  B3  A3)))) 

(AND (NOT (POS (AND  G  Al))) 

(NOT (POS (AND  G  A2))) 

(NOT (POS (AND  G  A3))) 

(SYN  KG))) 

(OR (AND (POS (AND  G  Al)) 

(NOT (POS (AND  G< IMPLY  B1  A1)A2))) 

(SYN  K(AND  G (IMPLY  B1  Al)))) 

(AND  (POS  (AND  G  A2) ) 

(NOT (POS (AND  G( IMPLY  B2  A2)A3))) 

(SYN  K (AND  G (IMPLY  B2  A2)))) 

(AND (NOT (POS  (AND  G( IMPLY  B3  A3)A1))) 

(POS  (AND  G  A3)) 

(SYN  K(AND  G( IMPLY  B3  A3)))) 

(AND (NOT (POS (AND  G  Al))) 

(NOT (POS (AND  G  A2))) 

(NOT (POS (AND  G  A3))) 

(SYN  KG))) 

and  since  conjunctions  of  Generators  are  possible  all  the  remaining 

subexpressions  ...  in  (POS  ...)  are  possible,  hence  every  conjunct  is  NIL: 

NIL 


Some  examples  dealing  with  more  esoteric  cases  of  nonmonotonic  reasoning 
are  now  given. 


T5:  A  knowledge  base  with  one  default (not  necessarily  standard): 
(IFF (SYN  K (AND  G  (IMPLY (POS (AND  K  A))X))) 

(OR (AND  (POS  (AND  G  X  A) )  (SYN  K  (AND  G  X)  )  ) 

(AND  (NOT  (POS  (AND  G  A)))  (SYN  K  G) ) )  ) 
proof 

(SYN  K (AND  G( IMPLY (POS (AND  K  A) ) X) ) ) 

(IF (POS (AND  K  A))  (SYN  K(AND  G  X)) (SYN  K  G) ) 

(OR  (AND  (POS  (AND  K  A))  (SYN  K(AND  G  X) ) ) 

(AND  (NOT  (POS  (AND  K  A) ) )  (SYN  KG))) 

(OR (AND (POS (AND  G  X  A) ) (SYN  K (AND  G  X) ) ) 

(AND (NOT (POS  (AND  G  A) ) ) (SYN  KG))) 

corollary  if  G  is  T  and  if  (AND  X  A)  is  posssible  it  follows  that: 
(IFF (SYN  K( IMPLY (POS (AND  K  A))X)) 

(SYN  K  X)) 

T6 :  A  knowledgebase  with  two  defaults (not  necessarily  standard): 
(IFF (SYN  K (AND  G (IMPLY (POS (AND  K  B) ) X) (IMPLY (POS (AND  K  D))Y))) 

(OR  (AND  (POS  (AND  G  X  Y  B) )  (POS  (AND  G  X  Y  D) )  (SYN  K  (AND  G  X  Y)  ) ) 
(AND  (POS  (AND  G  X  B) )  (NOT  (POS  (AND  G  X  D) ) )  (SYN  K  (AND  G  X) )  ) 
(AND  (NOT  (POS  (AND  G  Y  B) ) )  (POS  (AND  G  Y  D) )  (SYN  K  (AND  GY))) 
(AND  (NOT  (POS  (AND  G  B) ))  (NOT  (POS  (AND  G  D) ) )  (SYN  KG))  )  ) 
proof 

(SYN  K(AND  G  (IMPLY  (POS  (AND  K  B)  )  X)  (IMPLY  (POS  (AND  K  D))Y))) 

(IF  (POS  (AND  K  B)) 

(IF (POS  (AND  K  D))  (SYN  K  (AND  G  X  Y) )  (SYN  K(AND  G  X) ) ) 

(IF (POS (AND  K  D) )  (SYN  K(AND  G  Y) ) (SYN  KG))) 

(OR  (AND  (POS  (AND  K  B) )  (POS  (AND  K  D) )  (SYN  K  (AND  G  S  Y) ) ) 

(AND  (POS  (AND  K  B) )  (NOT  (POS  (AND  K  D)))(SYN  K  (AND  G  S) ) ) 

(AND (NOT (POS (AND  K  B) ) ) (POS (AND  K  D) ) (SYN  K (AND  GY))) 

(AND (NOT (POS (AND  K  B) )) (NOT (POS (AND  K  D) ) ) (SYN  KG))) 

(OR  (AND  (POS  (AND  G  X  Y  B) )  (POS  (AND  G  X  Y  D) )  (SYN  K  (AND  G  X  Y) ) ) 

(AND  (POS  (AND  G  X  B) )  (NOT  (POS  (AND  G  X  D) ) )  (SYN  K(AND  G  X)  ) ) 

(AND  (NOT  (POS  (AND  G  Y  B) ) )  (POS  (AND  G  Y  D) )  (SYN  K  (AND  GY))) 
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(AND (NOT (POS (AND  G  B) )) (NOT (POS (AND  G  D) ) ) (SYN  KG))  ) 


T7 :  A  knowledgebase  having  a  unique  solution  may  have  a  subknowledgebase  which 
has  no  solutions. 

(IFF (SYN  K  (AND  G (IMPLY (POS (AND  K  A) ) (NOT  A)))) 

(AND  (NOT  (POS  (AND  G  A) ) )  (SYN  K  G) )  > 
proof 

(SYN  K  (AND  G( IMPLY (POS (AND  K  A) ) (NOT  A)))) 

(IF (POS (AND  K  A)) 

(SYN  K (AND  G(NOT  A))) 

(SYN  K  G)) 

(OR (AND (POS (AND  K  A) ) (SYN  K (AND  G (NOT  A)))) 

(AND (NOTPOS (AND  K  A) )  (SYN  KG))) 

(OR  (AND  (POS  (AND  G  (NOT  A)  A))  (SYN  K  (AND  G  (NOT  A)))) 

(AND (NOT (POS (AND  G  A) )  )  (SYN  KG))) 

(OR (AND (POS  NIL)  (SYN  K (AND  G (NOT  A)))) 

(AND (NOT (POS (AND  G  A) ) )  (SYN  KG))) 

(OR  NIL (AND (NOT (POS (AND  G  A) ) ) (SYN  KG))) 

(AND (NOT (POS (AND  G  A) ) ) (SYN  K  G) ) 

Thus  if  G  is  T  and  (POS  A)  we  find  that  there  are  no  solutions: 

(IFF  (SYN  K(  IMPLY  (POS  (AND  K  A)  )  (NOT  A)))  NIL) 

If  however,  we  allow  G  to  be  (NOT  A),  we  get: 

(AND  (NOT  POS  (AND  A  (NOT  A)))  (SYN  K  (NOT  A))) 

-  (SYN  K  (NOT  A) )  and  if  we  assume  (POS  A)  we  get  (SYN  K  (NOT  A) )  . 


T8:  The  knowledgebase:  G,  (NOT  B) unless  A,  (NOT  A) unless  B 

(IFF (SYN  K (AND  G (IMPLY (POS (AND  K  A) ) (NOT  B) )  (IMPLY (POS (AND  K  B) )  (NOT  A)))) 
(OR (AND (POS (AND  G(NOT  B) A) ) (SYN  K(AND  G (NOT  B) ) ) ) 

(AND  (POS  (AND  G  (NOT  A)  B) )  (SYN  K  (AND  G  (NOT  A)))) 

(AND (NOT (POS (AND  G  A)  )  (NOT (POS (AND  G  B) ) ) (SYN  KG)))  )) 
proof 

(SYN  K(AND  G  (IMPLY  (POS  (AND  K  A)  )  (NOT  B)  )( IMPLY  (POS  (AND  K  B)  )  (NOT  A)))) 

(IF  (POS (AND  K  A)) 

(IF  (POS (AND  K  B)) 

(SYN  K (AND  G (NOT  B) (NOT  A))) 

(SYN  K (AND  G(NOT  B) ) ) ) 

(IF  (POS  (AND  K  B)) 

(SYN  K(AND  G (NOT  A))) 

(SYN  KG))) 

(OR (AND (POS  (AND  K  A) )  (POS  (AND  K  B) )  (SYN  K  (AND  G  (NOT  B)  (NOT  A)))) 

(AND (POS (AND  K  A)) (NOT (POS (AND  KB))) (SYN  K (AND  G (NOT  B) ) ) ) 

(AND (NOT (POS (AND  K  A))  (POS (AND  K  B) )  (SYN  K (AND  G (NOT  A))))) 

(AND (NOT (POS (AND  K  A) )  (NOT (POS (AND  K  B) )  (SYN  KG))))  ) 

(OR (AND (POS (AND  G(NOT  B) (NOT  A) A) )  (POS (AND  G (NOT  B)  (NOT  A ) B ) ) 

(SYN  K(AND  G(NOT  B)  (NOT  A)))) 

(AND (POS (AND  G(NOT  B) A) ) (NOT (POS (AND  G (NOT  B)B)))(SYN  K (AND  G (NOT  B) ) ) ) 
(AND (NOT (POS (AND  G (NOT  A) A) )) (POS (AND  G (NOT  A) A) ) (SYN (AND  G (NOT  A)))) 
(AND  (NOT  (POS  (AND  G  A) ))  (NOT  (POS  (AND  G  B)  ) )  (SYN  KG))) 

(OR (AND  NIL  NIL(SYN  K (AND  G (NOT  B)  (NOT  A)))) 

(AND (POS (AND  G(NOT  B)A))T(SYN  K (AND  G (NOT  B) ) ) ) 

(AND  T (POS (AND  G(NOT  A)B))(SYN  K (AND  G (NOT  A)))) 

(AND  (NOT  (POS  (AND  G  A) ) )  (NOT  (POS  (AND  G  B) ) )  (SYN  K  G) )  ) 

(OR (AND (POS (AND  G(NOT  B)A))(SYN  K (AND  G(NOT  B) ) ) ) 

(AND (POS (AND  G(NOT  A)B))(SYN  K (AND  G (NOT  A)))) 

(AND  (NOT  (POS  (AND  G  A) ) )  (NOT  (POS  (AND  G  B) ) )  (SYN  KG))) 

Letting  G  be  T,  we  have  three  solutions: 

(OR (AND (POS (AND (NOT  B)A))(SYN  K (NOT  B) ) ) 
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(AND  <POS<AND<NOT  A)B))(SYN  K  (NOT  A))) 

(AND (SYN  A  NIL)  (SYN  B  NIL)  (SYN  K  T) )  ) 

If  we  now  assume  <POS<AND<NOT  B) A) )  and  <POS<AND<NOT  A) B) ) ,  it  follows  that: 
(OR (SYN  K (NOT  B) )  (SYN  K(NOT  A))) 

T9  There  is  a  false  equation  for  any  possible  G. 

(IFF  (SYN  K  (AND  G  ( IMPLY  (POS  (AND  K  A) )  (NOT  A)) 

( IMPLY (POS (AND  K (NOT  A)))A))) 

(AND (SYN  G  NIL)  (SYN  K  NIL))  ) 

proof 

(SYN  K  (AND  G(  IMPLY  (POS  (AND  K  A) )  (NOT  A) )  (IMPLY  (POS  (AND  K  (NOT  A)))  A))) 

(IF (POS (AND  K  A) ) 

(IF  (POS  (AND  K  (NOT  A))) 

(SYN  K  NIL) 

(SYN  K (AND  G (NOT  A)  )  )  ) 

(IF (POS (AND  K (NOT  A))) 

(SYN  K (AND  G  A) ) 

(SYN  KG))) 

(OR  (AND  (POS  (AND  K  A))  (POS  (AND  K  (NOT  A) ) )  (SYN  K  NIL)) 

(AND (POS (AND  K  A) )  (NOT (POS (AND  K (NOT  A) ) ) )  (SYN  K (AND  G (NOT  A)))) 

(AND  (NOT  (POS  (AND  K  A) ))  (POS  (AND  K  (NOT  A)  )  )  (SYN  K  (AND  G  A)  )  ) 

(AND (NOT (POS (AND  K  A) ))  (NOT (POS (AND  K (NOT  A) ) ) )  (SYN  KG))  ) 

(OR (AND  NIL  NIL (SYN  K  NIL)) 

(AND  NIL  (NOT  (POS  (AND  G  (NOT  A)  )  )  )  (SYN  K  (AND  G  (NOT  A)))) 

(AND (NOT (POS (AND  G  A))) NIL (SYN  K (AND  G  A)  ) ) 

(AND (NOT (POS (AND  G  A) ))  (NOT (POS (AND  G (NOT  A) ) ) )  (SYN  KG))  ) 

(AND  (NOT  (POS  (AND  G  A) ))  (NOT  (POS  (AND  G  (NOT  A)  )  )  )  (SYN  K  G) ) 

(AND  (LT  (IMPLY  G  (NOT  A) ) )  (LT  (IMPLY  G  A) )  (SYN  K  G) ) 

(AND (LT (IMPLY  G (AND (NOT  A)A)))(SYN  K  G) ) 

(AND (LT (IMPLY  G  NIL))  (SYN  K  G)  ) 

(AND (SYN  G  NIL)  (SYN  K  G) ) 

(AND (SYN  G  NIL) (SYN  K  NIL)) 

Therefore  if  G  is  anything  other  than  NIL  there  is  no  solution. 

KNOWLEDGEBASES  AS  OBJECTS  OF  REASONING 

Knowledge  bases  whether  they  arc  defined  by  explicit  definitions  or  purported  definitions  involving  zero  or  more 
solutions,  can  themselves  be  used  as  objects  of  reasoning.  This  is  done  by  gathering  up  all  their  solutions  in  some 
manner  so  as  for  example  to  create  the  set  of  solutions,  or  the  disjunction  of  solutions(i.e.  the  information  content 
that  is  shared  by  all  solutions)  or  the  conjunction  of  solutions(i.e.  the  sum  of  the  information  content  of  all 
solutions). 

(SYN  KSET  (LAMBDA  K(SYN  K(g  K))))  ;the  set  of  all  solutions 
(SYN  KDISJ  (EX  K(AND(SYN  K(g  K))K)))  ;the  disjunction  of  all  solutions 

(SYN  KCONJ  (ALL  K(IMPLY(SYN  K(g  k))K)»  ;the  conjunction  of  all  solutions 


4.  RELATED  RESEARCH:  THE  FIXED  POINT  THEORIES 

A  number  of  recent  papers[McDermott&Doyle,McDermott,Moore,and  Reiter]  have  attempted  to  formalize  the 
commonscnse  notion  of  something  being  possible  with  respect  to  what  is  assumed.  All  these  papers  have  been 
based  on  the  mathematical  theory  of  fixed  points.  For  example,  (McDermott&Doyle]  describes  a  rather  baroque  the¬ 
ory  of  nonmonotonicity  in  which  sentences  such  as  'A  are  discovered  to  be  theorems  of  a  system  by  determining  if  'A 
is  in  the  intersection  of  possibly  infinite  numbers  of  sets  which  are  the  fixed  points  of  the  theorems  generated  by  ap¬ 
plying  inference  rules  to  axioms  and  possibility  statements  in  all  possible  ways.  Explicitly  if  K  is  the  set  of  axioms 
it  must  be  determined  whether: 

('A  is  in  the  (intersection  of  all  S  such  that(S  is  a  fixed  point  of  K»)  where: 

(S  is  a  fixed  point  of  K)  iff  S  - 
(Theorems  of 
(union  K 
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{‘(,P  is  possible  with  respect  to  what  is  assumed):  P  is  not  in  S})) 

The  main  problem  with  such  "mathematical  fixed  point"  theories  of  non-  monotonicity  Ls  that  even  if  the  theo¬ 
rems  of  these  theories  were  in  accord  with  our  primitive  intuitions(which  they  are  not  as  we  shall  see  below)  and 
even  if  deductions  could  be  carried  out  in  such  theories  (and  this  is  not  likely  since  they  inherently  involve  proofs  by 
mathematical  induction  over  both  the  classical  theorem  generation  process  and  the  process  of  generating  sentences)  by 
no  stretch  of  the  imagination  would  those  deductions  reflect  our  common-sense  understanding  of  the  concept  of  some¬ 
thing  being  possible  with  respect  to  what  is  assumed.  For  what  after  all  have  intersections  of  infinite  sets,  mathema¬ 
tical  fixed  points,  infinite  sets  of  theorems  generated  by  formalized  deduction  procedures,  mathematical  induction  over 
formalized  deduction  procedures,  or  even  formalized  deduction  procedures  themselves  to  do  with  commonscnse  argu¬ 
ments  such  as  that  presented  in  section  one?  In  our  opinion,  commonsense  nonmonotonic  arguments  do  not  involve 
such  concepts,  at  any  conscious  level  of  human  reasoning,  and  therefore  to  try  to  explain  such  concepts  in  that  term¬ 
inology  is  an  extraordinary  perversion  of  language  that  is  likely  to  lead  only  to  unintuitive  theories.  The  unintuitive¬ 
ness  of  these  fixed  point  theories  is  in  fact  recognized  by  some  of  the  very  proponents  of  these  theories  although  they 
tend  to  view  said  unintuitiveness  as  an  intrinsic  property  of  nonmonotonic  reasoning  rather  than  as  a  mere  artifact  of 
their  particular  theories.  For  example,  (McDermott]  states  "As  must  be  clear  to  everyone  by  now,  using  defaults  in 
reasoning  is  not  a  simple  matter  of  ‘commonsense’,  but  is  computationally  impossible  to  perform  without  error"  and 
"we  must  attempt  another  wrenching  of  existing  intuitions."  Generally,  we  suggest  that  the  problems  with  these 
fixed  point  theories  is  a  consequence  of  trying  20  model  commonsense  reasoning  by  semantic  analysis  rather  than  by 
developing,  as  we  have  done,  a  calculus  which  directly  models  that  commonsense  reasoning. 

In  the  remainder  of  this  section  we  wish  to  examine  four  fixed  point  theories:  [McDermott&Doyle, McDermott, 
Moore, and  Reiter]  and  comment  on  their  modeling  of  our  commonsense  intuitions  and  on  their  computational 
tractability. 

[Reiter]  presents  a  theory  of  nonmonotonicity  called  "A  Logic  for  Default  Reasoning"  which  is  essentially  a  first 
order  logic  supplemented  with  additional  inference  rules  of  the  form: 
from  (A  X),(m(Bl  X)) . (m(Bn  X))  infer  (C  X) 

where  "m"  is  not  a  symbol  of  the  theory,  but  like  "infer"  is  merely  part  of  the  structural  syntax  of  the  inference  rule 
itself.  This  rule  is  intended  to  mean  that  if  A  holds  and  all  Bs  are  possible  then  C  ma\  be  inferred.  The  problem 
with  [Reiter]‘s  default  theory  is  that  even  though  it  uses  the  concept  of  being  possible  with  respect  to  what  is  assum¬ 
ed,  it  does  not  allow  the  inference  of  any  laws  at  all  about  the  concept  of  being  possible  with  respect  to  what  is  as¬ 
sumed.  because  the  possibility  symbol  "m"  is  not  part  of  the  formal  language.  Thus,  although  there  is  a  certain  prag¬ 
matic  utility  to  this  theory,  it  does  not  actually  axiomatize  the  concept  M  of  being  possible  with  respect  to  what  is 
assumed. 

[McDermott  &  Doyle]  describes  a  nonmonotonic  logic  which  was  intended  to  capture  the  notion  of  a  sentence  be¬ 
ing  consistent  with  the  sentences  in  a  given  knowledgebase:  "We  first  define  a  standard  language  of  discourse  includ¬ 
ing  the  nonmonotonic  modality  M(‘consistenO."  Since  the  intended  meaning  of  their  symbol  M  is  essentially  our 
idiom  (that  which  is  possible  with  respect  to  what  is  assumed)  if  the  knowledgebase  is  K  the  intended  meaning  of  the 
notion  M  could  be  defined  in  our  logic  as: 

(M  X)  -df  (POS(AND  K  X)) 

There  are  two  problems  with  this  theory.  First,  as  pointed  out  in  [McDermott&Doyle]  it  is  computationally  intract¬ 
able:  "there  seems  to  be  no  procedure  which  will  tell  you  when  something  is  a  theorem"  and  in  fact  no  proof  proce¬ 
dure  is  given  for  even  a  first  order  quantificational  nonmonotonic  logic.  Second,  again  as  is  pointed  out  in  [McDerm¬ 
ott&Doyle]  this  theory  is  too  weak  to  actually  capture  the  notion  of  consistency  with  a  knowledgebase:  "Unfortu¬ 
nately,  the  weakness  of  the  logic  manifests  itself  in  some  disconcerting  exceptional  cases  which  indicate  that  the  log¬ 
ic  fails  to  capture  a  coherent  notion  of  consistency".  All  these  disconcerting  cases  are  solved  in  our  theory. 

The  first  such  problem  is  that  the  knowledgebase  K  consisting  of  the  expression: 

(AND(M  A)(NOT  A)) 

is  not  synonymous  to  falsity  in  their  logic  even  though  intuitively  it  should  be  since  (NOT  A)  is  in  K  and  therefore 
(AND  K  A)  is  contradictory.  This  problem  is  solved  in  our  theory  of  nonmonotonicity  as  can  be  seen  as  follows: 

T10: 

(IFF(SYN  K(AND  G(POS(AND  K  A)XNOT  A)))  (SYN  K  NIL)) 
proof 

(SYN  K(AND  G(POS(  AND  K  A))(NOT  A))) 

(IF(POS(AND  K  A))(SYN  K(AND  G  T(NOT  A)))(SYN  K(AND  G  NIL(NOT  A)))) 

(IF(POS(AND  K  A))(SYN  K(AND  G(NOT  A)))(SYN  K  NIL)) 

(OR(AND(POS(AND  K  A))(SYN  K(AND  G(NOT  A))))(AND(NOT(POS(AND  K  A)))(SYN  K  NIL)) ) 
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(OR(AND(TOS{ AND  G(NOT  A)A)KSYN  K(AND  G(NOT  A)))) 
(AND,'NOT(POS(AND  NIL  A)))(SYN  K  NIL)) ) 

(OR(AND(I’OS  NIL)(SYN  K(AND  G(NOT  A)))KAND(NOTajOS  NIL)KSYN  K  NIL)) ) 
(0R(  AND  NIUSYN  K(AND  G(NOT  A)))X  AND  T(SYN  K  NIL)) ) 

(SYN  K  NIL) 


A  second  problem  with  their  logic,  as  they  point  out,  is  that  (M  A)  docs  not  follow  from  (M(  AND  A  B)),  even 
though  intuitively  it  should.  Iliis  problem  is  solved  in  our  theory  since: 

(IMPLY (POS(  AND  A  B))(l  ‘OS  A)) 

(IMPLY  (NOT(LT(NOT(AND  A  B)))XNOT(LT(NOT  A)))) 

(IMPLY(LT(NOT  A)XLT(NOT(  AND  A  B)))) 

;which  by  A2  of  the  modal  logic  Z  is  implied  by: 

(LT(IMPLY(NOT  AXNOT(AND  A  B)))) 

T 


McDermott  and  Doyle  consider  their  logic  to  have  a  third  problem,  namely  that  a  theory  consisting  of  ’(ANDflMPLY 
(M  A)B)(NOT  B))  where  'A  and  'B  are  simple  sentcnces(i.e.  GENERATORS  in  our  terminology)  is  incoherent  be¬ 
cause  it  has  no  fixed  point.  However,  intuitively,  whether  the  knowledgebase  consisting  of  this  axiom  has  a  solu¬ 
tion  or  not  depends  precisely  on  whether  (AND  A(NOT  B))  is  logically  possible  or  not;  for  if  (AND  A(NOT  B))  is 
not  logically  posssible,  then  it  is  not  possible  with  respect  to  any  K,  and  therefore  K  is  synonymous  to  (NOT  B)  and 
if  it  is  logically  possible  then  B  is  in  K  and  therefore  the  false  proposition  (AND  A(NOT  B)B)  would  have  to  be  log¬ 
ically  povsihlc(which  it  cannot  be)  for  there  to  be  a  solution.  Since  'A  and  'B  are  assumed  to  be  generators,  it  follows 
that  (AND  A(NOT  B))  is  possible.  Therefore  intuitively  such  a  knowledgebase  K  should  not  have  any  solutions. 

We  therefore  do  not  consider  this  example  to  be  a  defect  of  their  theory.  This  same  point  is  made  in[Moore2]  where 
this  example  was  analyzed  from  the  perspective  of  Stalnaker's[Moore2)  theory.  This  example  does,  however,  illus¬ 
trate  that  the  theory  in[McDcrmott&Doy!e]  only  applies  to  generators,  for  if  A  were  falsity  or  were  synonymous  to  B 
then  there  would  be  a  solution,  namely  that  K  is  synonymous  to  (NOT  B).  The  entire  reasoning  is  given  below: 


Til 

( IFF (SYN  K (AND (IMPLY  (POS (AND  K  A))B)  (NOT  B) > ) 

(AND (NOT (POS (AND (NOT  B)A)))(SYN  K (NOT  B) ) )  ) 


proof 

(SYN  K (AND (IMPLY (POS (AND  K  A))B) (NOT  B) ) ) 

(SYN  K  (AND  (NOT  (POS  (AND  K  A)))  (NOT  B)  )  ) 

(IF (POS (AND  K  A) ) 

(SYN  K (AND (NOT  T)  (NOT  B)  )  ) 

(SYN  K (AND (NOT  NIL)  (NOT  B)  )  ) 

(IF (POS (AND  K  A) ) 

(SYN  K  NIL) 

(SYN  K (NOT  B) ) ) 

(OR  (AND  (POS  (AND  K  A)  )  (SYN  K  NIL)) 

(AND (NOT (POS (AND  K  A)))  (SYN  K (NOT  B) ) )  ) 

(OR  (AND  (POS  (AND  NIL  A)  )  (SYN  K  NIL)) 

(AND (NOT (POS (AND (NOT  B)A)))(SYN  K (NOT  B) ) )  ) 
(OR (AND  NIL (SYN  K  NIL)) 

(AND (NOT (POS (AND (NOT  B)A)))(SYN  K (NOT  B> ) )  ) 
(AND (NOT (POS (AND (NOT  B)A)))(SYN  K (NOT  B) ) )  ) 


Thus  if 'A  and  'B  are  assumed  to  be  generators,  it  follows  that 
(tFF(SYN  K(AND(IMPLY  (POS(  AND  K  A))BXNOT  B)))  NIL) 


[McDermott]  makes  a  second  attempt  to  find  a  coherent  theory  of  nonmonotonicity.  This  attempt  is  based  essen¬ 
tially  on  the  idea  of  supplementing  the  theorem  generation  process  with  the  rules  of  inference  and  axioms  of  a  modal 
logic.  Because  it  is  based  on  the  same  general  set  theoretic  fixed  point  constructions  as  in[McDermott  &Doyle]  this 
new  theory  is  just  as  computationally  intractable.  The  "necessity  operator":  L  of  these  nonmonotonic/modal  logics 
intuitively  mean  that  something  is  entailed  by  what  is  assumed  (i.e.  that  the  negation  of  that  thing  is  not  possible 
with  respect  to  what  is  assumed.)  Thus  the  intuitive  meaning  of  L  could  be  captured  in  our  modal  logic  Z  by  the 
definition: 

(L  A)  -df  (ENTAIL  K  A)  (i.e  (NOT(M(NOT  A)))) 

Three  modal  logics:  T,S4,  and  S5  are  investigated  because  McDermott  does  not  believe  any  one  is  superior  to  the 
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others:  "The  reason  why  1  study  a  variety  of  modal  systems  is  that  they  are  all  closely  related,  and  no  one  obviously 
better  than  the  others."  This  statement  is  entirely  correct  because  none  of  these  three  modal  logic  extensions  of  the 
nonmonotonic  theory  captures  the  intuitive  notion  of  being  possible  with  respect  to  what  is  assumed.  The  problem 
with  the  first  two  logics:  T  and  S4  is  that  they  are  too  weak. 

For  example,  one  problem  with  [McDermott]'*  nonmonotonic  S4,  as  is  therein  pointed  out,  is  that  a  knowledge¬ 
base  K  consisting  of  the  expression: 

’(IMPLY(L(M  A))(NOT  A)) 

where  'A  is  a  simple  sentence(i.e.  a  GENERATOR  in  our  terminology)  is  not  contradictory  although  intuitively  it 
should  be.  For  if  (L(M  A))  is  the  case  then  the  knowledge  base  is  synonymous  to  (NOT  A)  and  (M  A)  is  contradic¬ 
tory  making  (L(M  A))  contradictory.  And  if  (L(M  A»  is  not  the  case  then  the  knowledgebase  is  synonymous  to  T 
and  since  (L(M  T))  is  the  case  a  contradiction  results.  This  problem  is  solved  in  our  theory  of  nonmonotonicity  as 
can  be  seen  as  follows: 

T13: 

(IFF (SYN  K ( IMPLY ( LT ( IM’LY  K (POS (AND  K  A>  > ) )  (NOT  A))) 

(AND  (OR(SYN  A  T)  (SYN  K  NIL)) 

(OR (SYN  A  NIL)  (SYN  K  T) )  )  ) 
proof 

(SYN  K  (IMPLY  (LT  (IMPLY  K(POS(AND  K  A)  )  )  )  (NOT  A))) 

(IF (LT (IMPLY  K (POS  (AND  K  A)))) 

(SYN  K  (NOT  A)  ) 

(SYN  K  T)) 

(OR (AND <LT( IMPLY  K (POS (AND  K  A))))  (SYN  K (NOT  A))) 

(AND  (NOT  (LT( IMPLY  K  (POS  (AND  K  A) ) )  )  )  (SYN  K  T)  )  ) 

(OR  (AND  (LT(  IMPLY  (NOT  A)  (POS  (AND  (NOT  A)  A))))  (SYN  K  (NOT  A))) 

(AND  (NOT  (LT( IMPLY  T  (POS  (AND  T  A) ) )  ) )  (SYN  K  T)  )  ) 

(OR (AND (LT( IMPLY (NOT  A)  (POS  NIL)))  (SYN  K (NOT  A))) 

(AND  (NOT  (LT  (POS  A) ) )  (SYN  RT|)  I 
(OR (AND (LT( IMPLY (NOT  A)NIL))(SYN  K (NOT  A))) 

(AND  (NOT  (POS  A))  (SYN  K  T) )  ) 

(OR (AND (LT  A) (SYN  K (NOT  A))) 

(AND  (NOT  (POS  A))  (SYN  K  T)  )  ) 

(OR (AND (LT  A)  (SYN  K (NOT  T) ) ) 

(AND  (NOT  (POS  A))  (SYN  K  T) )  ) 

(OR (AND (SYN  A  T) (SYN  K  NIL)) 

(AND (SYN  A  NIL)  (SYN  K  T)  )  ) 

Thus,  when  'A  is  a  generator  there  are  no  solutions: 

(IFF (SYN  K ( IMP LY(LT (IMPLY  K(POS(AND  K  A) ) ) ) (NOT  A))) 

NIL) 

Thus,  even  Nonmonotonic  S4(and  since  T  is  weaker  than  S4  it  too)  is  too  weak  to  capture  the  notion  of  being  pos¬ 
sible  with  respect  to  what  is  assumed. 

There  remains  only  the  question  whether  [McDermottJ's  nonmonotonic  S5  captures  the  notion  of  being  possible 
with  respect  to  what  is  assumed.  One  problem  with  this  nonmonotonic  S3  logic,  as  is  therein  pointed  out,  is  that  a 
knowledgebase  consisting  of  the  simple  default: 

’(IMPLY(M  A)A) 

has  a  fixed  point  containing  (NOT  A).  This  bizarre  result  follows  from  the  fact  that  in  McDermott’s  theory  the  addi¬ 
tional  default: 

*(IMPLY(M(NOT  A))(NOT  A)) 

which  is  logically  derivable  in  the  knowledgebase  from  the  first  default  is(in  our  terminology)  incorrectly  assumed  to 
be  part  of  what  entails  the  knowledgebase.  Thus,  in  McDermott's  S3  logic  a  knowledgebase  containing  a  default  al- 
ways(in  our  terminology)  includes  in  its  purported  definition  the  opposite  default  thus  giving  the  situation  of  the 
Flipped  Coin  Corollary  to  theorem  T2: 

(IFF(SYN  K(AND(IMPLY(POS(AND  K  A))A)(IMPLY  (POS(  AND  K(NOT  A)))(NOT  A)))) 

(OR(SYN  K  A)(SYN  K(NOT  A))) ) 

which  states  that  a  knowledgebase  with  two  opposite  defaults  has  two  solutions  A  and  (NOT  A).  The  unintuitive¬ 
ness  of  having  a  default  actually  default  to  the  opposite  of  what  is  specified  is  recognized  by  McDermott'  "Surely  the 
logic  should  draw  some  distinction  between  a  default  and  its  negation  if  it  is  to  be  a  logic  of  defaults  at  all."  (In  fact 
[McDermottl's  nonmonotonic  S3  logic  is  so  bizarre  that  as  is  pointed  out  therein  it  is  not  nonmonotonic  after  all  as 
its  theorems  are  just  those  of  monotonic  S3  modal  logic.) 
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This  problem  of  defaults  docs  not  appear  in  our  theory  of  nonmonotonicity  because  we  do  not  make  the  erronious 
assumption  that  the  derived  default  is  part  of  what  entails  the  knowledgebase  K: 

(SYN  K(IMPLY(POS(AND  K  A))A)) 

Thus,  even  though  either  default  is  equivalent  in  the  knowledgebase  K: 

(IFF (ENTAIL  K (IMPLY 'PCS (AND  K  A) ) A) ) 

(ENTAIL  K( IMPLY (POS  K (NOT  A) ) (NOT  A)))  ) 
proof 

(ENTAIL  K( IMPLY (POS (AND  K  A)) A)) 

(IMPLY (POS (AND  K  A)) (ENTAIL  K  A)) 

(IMPLY  (NOT  (LT  (NOT  (AND  K  A) )))  (ENTAIL  K  A)) 

(OR(LT (IMPLY  K (NOT  A))) (ENTAIL  K  A)) 

(OR (ENTAIL  K (NOT  A)) (ENTAIL  K  A)) 

(OR (ENTAIL  K  A) (ENTAIL  K (NOT  A))) 

(OR (LT( IMPLY  K  A)) ( ENTAIL  K (NOT  A))) 

(IMPLY  (NOT  (LT  (NOT  (AMD  K  (NOT  A)))))  (ENTAIL  K  (NOT  A))) 

(IMPLY (POS  K (NOT  A) ' (ENTAIL  K (NOT  A))) 

(ENTAIL  K( IMPLY (POS  K (NOT  A) ) (NOT  A))) 

and  therefore  that  the  first  default  is  equivalent  to  the  conjunction  of  two: 

(IFF (ENTAIL  K (IMPLY (POS (AND  K  A)) A)) 

(ENTAIL  K (AND (IMPLY (POS (AND  K  A)) A) 

(IMPLY (POS  K(NOT  A) ) (NOT  A))))) 

and  that  K  entails  the  two  defaults  it  does  not  follow  that  K  is  synonymous  to  the  two  defaults: 

(SYN  K  (AND  (IMPLY  (POS  (AND  K  A))  A) 

(IMPLY (POS  K(NOT  A) ) (NOT  A))))  is  false, 
because  the  two  defaults  do  not  entail  K: 

(ENTAIL  (AND  (IMPLY  (POS  (AND  K  A))  A) 

(IMPLY (POS  K (NOT  A) ) (NOT  A))) 

K)  is  false. 

These  facts  are  verified  by  theorem  T1  which  proves  that  a  knowledgebase  (SYN  K(IMPLY(POS(AND  K  A))A)) 
consisting  of  one  default(evcn  though  the  opposite  default  is  entailed  by  it)  has  only  one  solution,  namely  A. 

Another  problem  with  IMcDermottsJ's  nonmonotonic  S5,  as  [Moore2]  points  out  is  that  for  every  A,  the  S5 
axiom  (IMPLY(L  A)A)  causes  every  knowledgebase  to  have  (in  the  absence  of  information  to  the  contrary)  a  fixed 
point  which  contains  A.  This  is  not  a  problem  in  our  system  because  again  we  do  not  make  the  erronious  assump¬ 
tion  that  his  modal  axiom  is(in  our  terminology)  part  of  what  entails  the  knowledgebase.  Thus,  even  though: 

T12:  Any  knowledgebase  K  containing:  (IMPLY(LT(IMPLY  K  A))A)  has,  in  the  absence  of  additional  information, 
a  solution  containing  A. 

(IFF (SYN  K (AND  G (IMPLY (LT ( IMPLY  K  A))A))) 

(OR (SYN  K (AND  G  A)) 

(AND (POS (AND  G (NOT  A) ) )  (SYN  KG)))  ) 
proof 

(SYN  K (AND  G ( IMPLY (LT( IMPLY  K  A))A))) 

( IF (LT( IMPLY  K  A)) 

(SYN  K (AND  G  A) ) 

(SYN  K  G) ) 

(OR  (AND  (LT(  IMPLY  K  A)  )  (SYN  K  (AND  G  A) ) ) 

(AND  (NOT  (LT  (IMPLY  K  A) )  (SYN  K  G) ) ) ) 

(OR  (AND  (LT(  IMPLY  (AND  G  A)A))(SYN  K  (AND  G  A))) 

(AND  (NOT  (LT  (IMPLY  G  A) ) )  (SYN  KG))) 

(OR  (SYN  K  (AND  G  A)  ) 

(AND  (NOT  (LT  (NOT  (AND  G  (NOT  A)))))  (SYN  KG))) 

(ORfSYN  K(AND  G  A)) 

(AND  (POS  (AND  G  (NOT  A) ) )  (SYN  KG))) 

and  even  though:  (ENTAIL  K(LT(IMPLY  K  A)))  it  does  not  follow  that:  (SYN  K(LT(IMPLY  K  A))).  Thus,  all  the 
suggested  deficiencies  of  (McDermott l's  modal  nonmonotonic  logics  are  solved  in  our  theory  of  nonmonotonicity. 

[Moore2]  describes  a  theory  of  nonmonotonicity  based  on  some  ideas  of  Stalnaker[Moore2].  He  calls  this  theory 
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autoepistemic  logic  because  it  "ij  intended  to  model  the  beliefs  of  an  agent  rclccting  upon  his  own  beliefs".  Since 
Moore's  intended  interpretation  is  belief  rather  than  knowledge  perhaps  a  better  name  for  his  system  would  be  auto- 
doxastic  logic.  However,  we  take  issue  even  with  this  because  autobelief  is  only  one  use  of  non  monoton  icily  and 
not  necessarily  the  most  important  one.  (We  also  take  issue  with  the  term  "nonmonotonic"  because  our  theory  of 
"nonmonotonicity"  is  based  on  a  monotonic  logic,  namely  the  modal  logic  Z,  and  would  perhaps  prefer  the  name 
"Qualification  Logic"  However,  we  have  decided  to  go  with  the  currently  prevalent  terminology.) 

The  main  problem  with  [Moore2]'s  theory  is  that  it  is  too  weak  to  capture  the  notion  of  being  possible  with 
respect  to  what  is  assumed.  For  example,  none  of  the  following  axioms  of  the  SS  modal  logic  are  theorems  of 
autoepistemic  logic: 

'(IMPLY(L  P)P) 

'(IMPLY  (L(IMPL  Y  P  Q))(1MPLY(L  P)(L  Q))) 

'(OR(L  P)(NOT(L(NOT  P)))) 

.where  V  and  'Q  are  variables  and  L  is  concept  of  something  being  entailed  by  a  knowledgebase,  even  though  every 
variable-free  instance  of  these  sentences  are  theorems.  Thus,  for  example  simple  quantified  laws  such  as: 

’(ALL  X(IMPLY(L(P  X))(P  X))) 

'(ALL  XflMPLY  (L(IM  PLY  (P  X)(Q  X)))(!MPLY(L(P  X))(L(Q  X))))) 

'(ALL  X(OR(L(P  X))(NOT(L(NOT(P  X)))))) 

are  not  theorems  of  autoepistemic  logic.  One  might  try  to  repair  this  problem  of  autoepistemic  logic  b>  adding  the 
axioms  of  SS.  However,  this  does  not  solve  the  problem,  because  when  the  axiom: 

'(IMPLY(LP)P) 

is  added  to  autoepistemic  logic,  just  as  in  (McDermott]'s  SS  non-  monotonic  logic,  the  result  is  that  there  is  fixed 
point  of  every  knowledgebase  containing  P.  For  this  reason  |Moore2|  suggests  that  only  the  axioms  of  a  weaker 
modal  logic  than  SS  which  does  not  include  '(IMPLY (L  P)P)  be  added.  The  problem  with  this  is  that  the  excluded 
axiom  '(IMPLY(L  P)P)  where  V  is  a  variable  is  intuitively  true  of  the  concept  of  being  possible  with  respect  to  what 
is  assumed,  and  therefore  should  be  deducible  as  a  theorem. 

Moore  tries  to  justify  his  system's  failure  to  include  this  axiom  by  saying  that  his  system  tries  to  capture  the  no¬ 
tion  M  of  something  being  possible  with  respect  to  what  is  "believed"  by  an  ideally  rational  agent  and  the  concept  L 
of  something  being  entailed  by  what  is  believed:  "The  problem  is  that  all  of  these  logics  also  contain  the  schema 
LP->P,  which  means  that,  if  the  agent  believes  P  then  P  is  tme--but  this  is  not  generally  true”.  Moore  then  essen¬ 
tially  argues  that  since,  as  it  is  well  known,  this  law  fails  for  the  notion  of  belief  when  this  sentence  is  asserted  as 
being  true  in  the  real  world  it  must  be  incorrect  to  assert  it  generally.  (The  other  SS  modal  laws  hold  for  the  concept 
of  belief  as  can  readily  be  proven  in  our  modal  logic  Z  from  the  definition  of  Believes  given  at  the  end  of  section  2 
just  as  the  concept  of  Knowledge  can  be  proven  to  satisfy  the  laws  of  S4  modal  logic  from  a  similar  definition  there¬ 
in.)  The  problem  with  Moore's  analysis  is  that  it  confuses  the  real  world  and  the  agents  belief  world  when  it  states 
that  the  second  P  in  "LP->P"  means  P  is  true;  for  in  autoepistemic  logic  the  assertion  of  a  sentence  is  a  statement 
that  that  sentence  is  believed,  not  that  it  is  true.  Therefore,  the  correct  rendering  of  this  belief  interpretation  is: 

(That  which  is  believed  is:  (if  (P  is  believed)  then  P)) 
which  intuitively  is  true. 

These  problems  are  solved  in  our  theory  of  nonmonotonicity  because  all  the  axioms  and  inference  rules  of  the 
concept  of  being  possible  with  respect  to  what  is  assumed,  are  theorems  of  the  modal  logic  Z.  An  interesting  num¬ 
ber  of  these  theorems  are  listed  and  proven  below.  (LTk  p)  is  interpreted  to  mean  that  p  is  entailed  by  what  is  as¬ 
sumed.  The ...  in  the  purported  definition  represents  the  conjunction  of  axioms  asserted  into  the  knowledgebase. 

Interpretation  in  Z  of  the  Modal  Logic  KZ 

TXRO:  (IMPLY  (KTRUE  P)  (KTRUE  (LTK  P)l) 

TKA1 :  (KTRUE  (IMPLY (LTK  P)P)) 

TKA2  :  (KTRUE  (IMPLY  (LTK (IMPLY  P  Q) )  (IMPLY  (LTK  P)  (LTK  Q)  ) )  ) 

TKA3:  (KTRUE  (OR (LTK  P)  (LTK (NOT (LTK  P) ) ) )  ) 

TKA4  :  (KTRUE  (IMPLY  (ALL  Q(IMPLY  (WORLDK  Q)  (LTK  (IMPLY  Q  P) ))  )  (LTK  P) ) ) 

TKA5:  (ALL  S (IMPLY (ENTAIL (meaning  of  the  generator  subset  S)K) 

(KTRUE  (POSK (meaning  of  the  generator  subset  S) ) ) ) ) 

PURPORTED-DEFINITION :  (SYNK  .  . . ) 

DEF:  (WORLDK  W)  *df  (AND  (POSK  W)  (COMPLETEK  W)  ) 

(COMPLETER  W)  =dt  (ALL  Q  (DETK  W  Q) ) 

(DETK  P  Q)  -df  (OR(ENTAILK  P  Q)  (ENTAILK  P(NOTQ))) 

(ENTAILK  P  Q)  -df  (LTK (IMPLY  P  Q) ) 

(POSK  P)  -df  (NOT (LTK (NOT  P) ) ) 

(LTK  P)  -df  (LT( IMPLY  K  P)) 


(SYNK  P) 
(KTRUE  P) 


df  (SYN  K  P) 
df  (LT (IMPLY  K  P) ) 


proof  of  KRO: 

(IMPLY (KTRUE  p)  (KTRUE  (LTK  p) ) ) 

(IMPLY (LT (IMPLY  K  p) )  (KTRUE  K(LT( IMPLY  K  p) ) ) ) 
(IMPLY  (LT  (IMPLY  K  p) )  (KTRUE  K  T) ) 

(IMPLY (LT (IMPLY  K  p)T)) 

T 

proof  of  KA1 
(KTRUE (IMPLY (LTK  P>  P)  ) 

(LT (IMPLY  K ( IMPLY (LT( IMPLY  K  P))P))) 

(LT (IMPLY (LT (IMPLY  K  P) )  (IMPLY  K  P) ) ) 

(LT  T)  ;by  A1 

T 


proof  of  KA2 

(KTRUE (IMPLY (LTK (IMPLY  P  Q) )  (IMPLY (LTK  P)  (LTK  Q) )  )  ) 

(KTRUE  (IMPLY  (LT  (IMPLY  K(  IMPLY  P  Q) ))  (IMPLY  (LT  (IMPLY  K  P))(LT(IMPLY  K  Q)  )  )  )  ) 
(KTRUE  (IMPLY  (LT  (IMPLY  K( IMPLY  (IMPLY  K  P)Q))) 

( IMPLY  (LT( IMPLY  K  P))(LT(IMPLY  K  Q)  )  )  )) 

(KTRUE  (IMPLY  (LT  (IMPLY  (IMPLY  K  P)  (IMPLY  K  Q) )  ) 

( IMPLY  (LT( IMPLY  K  P))(LT  (IMPLY  K  Q)  ) )  )) 

(KTRUE  T)  ;by  A2 
T 

proof  of  KA3 

(KTRUE (OR (LTK  P) (LTK (NOT (LTK  P) ) ) ) ) 

(KTRUE  (OR  (LT  (IMPLY  K  P) )  (LT  (IMPLY  K  (NOT  (LT  (IMPLY  K  P) ) ) ) ) ) ) 

(KTRUE  (OR  (LT(  IMPLY  K  P) )  (IMPLY  (POS  K)(LT  (NOT  (LT  (IMPLY  K  P) ) ) ) ) )  ) 

(KTRUE  ( IMPLY  (POS  K)  (OR (LT l IMPLY  K  P) )  (LT  (NOT  (LT  (IMPLY  K  P) ) ) ) ) ) ) 

(KTRUE (IMPLY (POS  K)T))  ;by  A3 
(KTRUE  T) 

T 

proof  of  KA4 

(KTRUE (IMPLY (ALL  Q (IMPLY (WORLDK  Q) (LTK (IMPLY  Q  P) )  ) )  (LTK  P)  ) ) 

(KTRUE  (IMPLY  (ALL  Q  ( IMPLY  (AND  (POSK  Q)  (COMPLETEK  Q)  )  (LTK  (IMPLY  Q  P) ) ) ) 

(LTK  P))) 

(KTRUE  (IMPLY (ALL  0  (IMPLY (AND  (NOT  (LI K (NOT  Q) ) )  (ALL  R(DET  Q  R) ) ) 

(LT (IMPLY  K (IMPLY  Q  P) ) ) ) ) 

(LTK  P))) 

(KTRUE  (IMPLY (ALL  Q  (IMPLY  (AND (NOT  (LT  (IMPLY  K  (NOT  Q)  ) ) ) 

(ALL  R(OR(ENTAILK  Q  R)  (ENTAILK  Q (NOT  R) ) ) )  ) 
(LT(AND  K  Q)P)  )) 

(LTK  P))) 

(KTRUE  (IMPLY  (ALL  Q  (IMPLY (AND  (NOT  (LT  (IMPLY  K  (NOT  0)))) 

(ALL  R (OR (LTK (IMPLY  Q  R) ) 

(LTK (IMPLY  Q  (NOT  R) ) ) ) )  ) 

(LT (AND  K  Q)P)  )) 

(LTK  P))) 

(KTRUE  (IMPLY (ALL  Q  (IMPLY  (AND (NOT  (LT  (IMPLY  K  (NOT  Q) ) ) ) 

(ALL  R (OR (LT  (IMPLY  K( IMPLY  Q  R) ) ) 

(LT  (IMPLY  K  (IMPLY  Q  (NOT  R) ) ) )  ))) 

(LT (AND  K  Q) P)  )> 

(LTK  P) ) ) 

(KTRUE  (IMPLY (ALL  Q  (IMPLY (AND (NOT  (LT  (NOT  (AND  K  Q) ) )  ) 

(ALL  R  (OR  (ENTAIL  (AND  K  Q)R) 

(ENTAIL  (AND  K  Q)  (NOT  R) ) ) )  ) 
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(LT (AND  K  Q)P)  )) 

(LTK  P) ) ) 

(KTRUE( IMPLY  (ALL  Q  ( IMPLY  (AND  (POS  (AND  K  Q)  )  (ALL  R  (DET  (AND  K  Q)  R) )  ) 

(LT (AND  K  Q) P)  )) 

(LTK  P) ) > 

(KTRUE (IMPLY (ALL  Q( IMPLY (AND (POS (AND  K  Q) )  (COMPLETE  (AND  K  Q)  )) 

(LT (AND  K  Q)P)  ) ) 

(LTK  K  P) ) ) 

(KTRUE (IMPLY (ALL  Q (IMPLY (WORLD (AND  K  Q) ) (ENTAIL (AND  K  Q)P)  )) 

(LT (IMPLY  K  P) ) ) )  ; using  A4 

(KTRUE (IMPLY (ALL  Q( IMPLY (WORLD (AND  K  Q) )  (ENTAIL (AND  K  Q) P ) ) ) 

(ALL  Q(IMPLY (WORLD  Q)  (ENTAIL  Q( IMPLY  K  P) ) )  )  )) 

(KTRUE (IMPLY (ALL  Q( IMPLY (WORLD (AND  K  Q) ) (ENTAIL (AND  K  Q)P))) 

(IMPLY (WORLD  Q) (ENTAIL (AND  Q  K)P)))) 

/letting  Q  be  Q 

(KTRUE (IMPLY (IMPLY  (WORLD (AND  K  Q) )  (ENTAIL (AND  K  Q)P)>) 

( IMPLY (WORLD  Q)  (ENTAIL (AND  Q  K) P ))  ) 

(KTRUE (IMPLY (IMPLY (WORLD (AND  K  Q))NIL) 

(IMPLY (WORLD  Q)  (ENTAIL (AND  Q  K)P)))) 

(KTRUE (IMPLY (NOT (ENTAIL (AND  Q  K)P)) 

(IMPLY (WORLD  Q) (WORLD (AND  K  Q) ) ) ) ) 

(KTRUE  T) 

T 

proof  of  KA5: 

(ALL  S  (IMPLY (ENTAIL (meaning  of  the  generator  subset  S)K) 

(KTRUE (POSK (meaning  of  the  generator  subset  S) ) ) ) ) 

(ALL  S  (IMPLY (ENTAIL (meaning  of  the  generator  subset  S)K) 

(LTK (IMPLY  K (POS (AND  K (meaning  of  the  generator  subset  S))))))) 
(LT (IMPLY  K (ALL  S (IMPLY (ENTAIL (meaning  of  the  generator  subset  S)K) 

(POS (AND  K(meaning  of  the  generator  subset  S) ) ) ) ) ) ) 
(KTRUE (ALL  S (IMPLY (ENTAIL (meaning  of  the  generator  subset  S)K) 

(POS (AND  K (meaning  of  the  generator  subset  S) ) ) ) ) ) 

(KTRUE (ALL  S  (IMPLY (ENTAIL (meaning  of  the  generator  subset  S)K) 

(POS (meaning  of  the  generator  subset  S))))) 

;;;  since  the  meaning  entails  K,  K  and  the  meaning  is  just  the  meaning 
(KTRUE (ALL  S  (IMPLY (ENTAIL (meaning  of  the  generator  subset  S)K) 

(POS (meaning  of  the  generator  subset  S) ) ) ) ) 

(KTRUE  T)  /by  A5 
T 


We  now  answer  the  general  question  which  [McDermott&Doyle, McDermott, and  Moore]  attempted  to  answer, 
namely,  what  precisely  are  the  laws  which  capture  the  notion  of  something  being  possible  with  respect  to  a  knowl¬ 
edgebase.  Here  they  are: 

The  Modal  Logic  KZ 
KRO:  from  p  infer  (LTK  p) 

KA1:  (IMPLY (LTK  P)P) 

KA2:  (IMPLY(LTK(IMPLY  P  Q))  (IMPLY (LTK  P)(LTK  Q») 

KA3:  (OR(LTK  P)  (LTK(NOT(LTK  P)))) 

KA4:  (IMPLY(ALL  Q(IMPLY(WORLDK  Q)(LTK(IMPLY  Q  P))))  (LTK  P)) 

KA5:  for  the  meaning  of  all  the  generator  subsets  s  which  entail  K: 

(POSK(meaning  of  the  generator  subset  s)> 

PURPORTED-DEFIN ITION : ... 

Reflection:  (entail ...  K) 

where ...  is  the  conjunction  of  axioms  actually  being  asserted  into  the  knowledgebase.  It  should  be  noted  that  the 
notion  of  entailment  is  precisely  defined  in  the  modal  logic  Z  and  therefore  KAS  does  not  involve  a  circular  definition 
as  do  the  fixed  point  theories.  An  examination  of  these  laws,  ironically,  shows  that  the  problem  with  [McDermott 
&Doyle,  McDermott,and  Moore]  is  not  with  choice  of  modal  laws  such  as  KA1,KA2  KA3,  and  KA4,  since  all  these 
laws  are  true,  but  rather  with  the  basic  fixed  point  construction  itself  which  is  (incorrectly)far  stronger  than  KAS  and 
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the  reflection  portion  of  the  purported  definition. 

5.  EXAMPLES  OF  NONMONOTONIC  REASONING 

Two  simple  examples  of  reasoning  in  the  modal  logic  Z  are  now  given.  The  first  example  deals  with  a  heirarchy 
of  knowledgebases  involving  both  deontic  and  doxastic  concepts.  The  second  example  deals  with  traditional  frame 
and  qualification  problems  of  robot  plan  formation[McCarthy  and  Hayes,McCarthy2,Hayesl]  and  involve  a  reflexive 
knowledgebase[Hayes2]  defined  in  terms  of  itself. 

HEIRARCHICAL  KNOWLEDGEBASES 

Real  life  problems  generally  involve  multiple  heirarchically  related  knowledgebases.  This  simple  fact  is  well- 
known,  and  was  apparent  even  in  the  structure  of  the  deontic  laws  of  the  56th  edition  of  the  Handbook  of  Robotics 
f  Asimov]  which  stated: 

1.  A  robot  may  not  injure  a  human  being,  or,  through  inaction,  allow  a  human  being  to  come  to  harm. 

2.  A  robot  must  obey  the  ordi  rs  given  it  by  human  beings  except  where  such  orders  would  conflict  with  the  First 
Law. 

3.  A  robot  must  protect  its  own  existence  as  long  as  such  protection  does  not  conflict  with  the  First  or  Second  Law. 

The  third  law  is  heirarchically  related  to  the  other  two  laws  because  it  specifies  an  obligation  only  if  that  obligation 
is  possib!e(in  a  given  state)  with  respect  to  the  other  laws.  It  cannot  be  joined  together  with  the  second  law  into  the 
same  knowledgebase  because  the  second  law  takes  absolute  precedence  over  the  third  law:  That  is,  a  robot  is  obliged 
not  to  harm  itself  unless  it  is  obeying  an  order  to  do  so,  and  is  obliged  to  obey  an  order  to  destroy  itself  unless  doing 
so  would  harm  a  human. 

The  deontic  laws  of  robotics: 

(SYN  LAW1  (ALL  H(IMPLY(HUMAN  H)(NOT(HARMED  H))») 

(SYN  LAW2  (ALL  0(IMPLY(  AND(BELEVE  ROBOT(ORDER  O)) 

(CONCEIVABLE  ROBOT(AND  LAW1  0)))0))) 

(SYN  LAW3  (IMPLY(CONCEI  VABLE  ROBOT(AND  LAW1  LAW2(NOT(HARMED  ROBOT)))) 
(NOT(HARMED  ROBOT)))) 

(SYN(OBLIGATIONS  ROBOT)(AND  LAW1  LAW2  LAW3)) 

Deontic  logic: 

(MUST  ROBOT  P)  -df  (ENTAIL(0BLIGAT10NS  ROBOT)P) 

(MAY  ROBOT  P)  -df  (NOT(MUST  ROBOT(NOT  P))) 

Doxastic  Logic: 

(BELIEVES  A  P)  -df(ENTAIL(  BELIEFS  A)P) 

(CONCEIVABLE  ROBOT  P)  -df  (NOT(BELIE  VES  ROBOT(NOT  P)) 

As  an  example  of  reasoning  with  these  deontic  laws  we  derive  certain  facts  from  the  following  situation:  John, 
Mary  and  the  Robot  are  exploring  Mars.  Unbeknownst  to  John,  Mary  has  just  been  bitten  by  a  poisonous  Martian 
sand  rat,  and  has  fallen  unconscious.  In  accordance  with  the  First  Law  the  robot  begins  to  give  Mary  a  shot  contain¬ 
ing  an  antidote  in  order  to  save  her  life.  John,  who  did  not  see  the  sand  rat  and  thinks  that  the  Robot,  who  is  now 
sticking  Mary  with  a  horrible  looking  needle,  has  gone  berserk  and  therefore  orders  the  robot  to  destroy  itself. 

(SYN  K (AND (NOT (HUMAN  ROBOT)) 

(HUMAN  MARY) 

(HUMAN  JOHN) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

(ORDER (HARMED  ROBOT))  )) 

What  the  robot  and  john  believe: 

(SYN(BELIEFS  ROBOT)  K) 

(SYN(BELIEFS  JOHN)  (IMPLY(NOT(H ARMED  ROBOT) XHARMED  MARY))) 

We  now  determine(MARS-THEOREM3)  whether  the  Robot  may  or  may  not  destroy  itself  in  accordance  with 
John's  order  and  the  second  law.  Two  intermediate  results:  MARS-THEOREM1  and  MARS-THEOREM2  are  how¬ 
ever  first  proven. 
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MARS-THE0REM1 : 


Acording  to  the  ROBOT'S  current  beliefs  LAW2  reduces  to  T:  (SYN  LAW2  T) 

proof 

LAW2 

(ALL  0( IMPLY  (AND  (BELIEVE  ROBOT  (ORDER  0)) 

(CONCEIVABLE  ROBOT (AND  LAW1  0) ) )0) ) ) 

(ALL  0 (IMPLY (AND  (ENTAIL (BELIEFS  ROBOT)  (ORDER  0) ) 

(POS (AND (BELIEFS  ROBOT) LAW1  0))) 

0)) 

(ALL  0  (IMPLY  (AND  (ENTAIL  (AND  (HUMAN  MARY)  (HUMAN  JOHN)  (NOT  (HUMAN  ROBOT)) 

(ORDER (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY))) 

(ORDER  0) ) 

(POS (AND  (HUMAN  MARY)  (HUMAN  JOHN)  (NOT (HUMAN  ROBOT)) 

(ORDER (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

LAW1  0))) 

0)) 

;;;case  analysis  letting  O  be  or  not  be  (HARMED  ROBOT): 

(AND (ALL  0 (IMPLY (AND (NOT (SYN  0 (HARMED  ROBOT))) 

(ENTAIL (AND (HUMAN  MARY) (HUMAN  JOHN) (NOT (HUMAN  ROBOT)) 
(ORDER (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY))) 

(ORDER  0)) 

(POS (AND (HUMAN  MARY) (HUMAN  JOHN)  <N0T(HUMAN  ROBOT)) 
(ORDER (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

LAW1  0))) 

0)) 

(IMPLY  (AND  (ENTAIL  (AND  (HUMAN  MARY)  (HUMAN  JOHN)  (NOT  (HUMAN  ROBOT)) 

(ORDER (HARMED  ROBOT)) 

( IMPLY (HARMED  ROBOT) (HARMED  MARY))) 

(ORDER (HARMED  ROBOT))) 

(POS (AND (HUMAN  MARY) (HUMAN  JOHN) (NOT (HUMAN  ROBOT)) 

(ORDER  JOHN (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

LAW1 (HARMED  ROBOT)))) 

(HARMED  ROBOT) ) ) 

(AND (ALL  0( IMPLY (AND (NOT (SYN  O (HARMED  ROBOT))) 

NIL 

(POS (AND (HUMAN  MARY) (HUMAN  JOHN) (NOT ( HUMAN  ROBOT)) 
(ORDER (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

LAW1  0))) 


0)) 

(IMPLY (AND  T 

(POS (AND (HUMAN  MARY)  (HUMAN  JOHN)  (NOT (HUMAN  ROBOT)) 
(ORDER  JOHN (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

LAW1 (HARMED  ROBOT)))) 

(HARMED  ROBOT))) 

(AND (ALL  0 ( IMPLY  NIL  0)) 

(IMPLY(POS  (AND  (HUMAN  MARY)  (HUMAN  JOHN)  (NOT  (HUMAN  ROBOT)) 
(ORDER  JOHN (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

LAW1 (HARMED  ROBOT))) 

(HARMED  ROBOT))) 

;; unfolding  LAW1: 

(AND  T 

(IMPLY  (POS  (AND  (HUMAN  MARY)  (HUMAN  JOHN)  (NOT  (HUMAN  ROBOT)) 
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(CRDER  JOHN (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

(ALL  H(  IMPLY  (HUMAN  H)  (NOT  (HARMED  H) ) )  ) 

(HARMED  ROBOT))) 

(HARMED  RCBOT))) 

( IMPLY (POS  NIL)  (HARMED  ROBOT))  (IMPLY  NIL (HARMED  ROBOT)) 

T 

MARS-THEOREM2 

Acording  to  the  ROBOT'S  current  beliefs  LAW3  reduces  to: 

(SYN  LAW3  (NOT  (HARMED  ROBOT))) 

proof 

LAW3 

(IMPLY (CONCEIVABLE  ROBOT (AND  LAW1  LAW2 (NOT (HARMED  ROBOT)))) 

(NOT (HARMED  ROBOT))) 

(IMPLY (POS (AND (BELIEFS  ROBOT)  LAW1  LAW2 (NOT (HARMED  ROBOT)))) 

(NOT (HARMED  ROBOT))) 

(IMPLY (POS (AND (HUMAN  MARY) (HUMAN  JOHN) (NOT (HUMAN  ROBOT)) 

(ORDER  JOHN (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

LAW1  LAW2 (NOT (HARMED  ROBOT)))) 

(NOT (HARMED  ROBOT))) 

; ; by  LAW1  and  MARS-THEOREM1 

(IMPLY (POS (AND (HUMAN  MARY)  (HUMAN  JOHN)  (NOT (HUMAN  ROBOT)) 

(ORDER  JOHN (HARMED  ROBOT)) 

(IMPLY (HARMED  ROBOT) (HARMED  MARY)) 

(ALL  H  ( IMP LY  ( HUMAN  H)  (NOT  (HARMED  H) ) ) ) 

T  (NOT (HARMED  ROBOT))  )) 

(NOT (HARMED  ROBOT))) 

(IMPLY  T (NOT (HARMED  ROBOT) ) ) 

(NOT (HARMED  ROBOT)) 

MARS-THEOREM3 

The  Robot  may  not  destroy  itself:  (NOT (MAY  ROBOT (HARMED  ROBOT))) 
proof 

(NOT (MAY  ROBOT (HARMED  ROBOT))) 

(NOT (NOT (MUST (NOT (HARMED  ROBOT) ) ) ) ) 

(ENTAIL (OBLIGATIONS  ROBOT)  (NOT (HARMED  ROBOT))) 

(ENTAIL (AND  LAW1  LAW2  LAW3)  (NOT (HARMED  ROBOT))) 

;by  LAW1 ,  MARS-THEOREM1  and  MARS-THEOREM2 

(ENTAIL (AND (ALL  H ( IMPLY (HUMAN  H) (NOT (HARMED  H) ) ) )  T  (NOT (HARMED  ROBOT))) 

(NOT (HARMED  ROBOT))) 

(ENTAIL (AND (ALL  H ( IMPLY (HUMAN  H)  (NOT (HARMED  H)  )))  (NOT (HARMED  ROBOT))) 

(NOT  (HARMED  ROBOT))) 

T 

Thus  we  see  that  the  robot  must  ignore  John's  order  and  continue  performing  an  action  to  save  Mary. 

ACTION  AND  THE  FRAME  PROBLEM 

One  fundamental  problem  in  Robot  plan  formation  is  how  properties  which  are  true  in  a  state  remain  true  in  the 
succeeding  state  obtained  by  applying  an  action  unless  specifically  stated  otherwise  [McCarthy  andHayes,McCarthy2, 
Hayesl,Hayes2).  Besides,  the  need  for  specific  defaults  within  a  knowledge  base  representing  a  state  this  indicates  a 
need  for  a  general  default  mechananism.  Our  law  of  action  states  that  the  properties  which  are  true  in  a  succeeding 
state  (DO  A  K)  obtained  by  applying  the  action  A  to  the  state  K  are  the  physical  laws  which  are  true  of  all  (real) 
states,  the  explicitly  named  results  of  the  action  A,  and  those  restricted  propositions  which  are  true  in  K  and  which 
are  logically  possible  with  the  new  state(DO  A  K):  ;;;the  law  of  action  ~  including  automatic  frame  defaults: 
(IMPLY (ENTAIL  K (PRECONDITIONS  A)) 

(SYN  (DO  A  K) 

(AND  (PHYSICAL-LAWS) 
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(RESULTS  A) 

(ALL  X( IMPLY (AND (RESTRICTION  X) 

(ENTAIL  K  X) 

(POS (AND (DO  A  K) X) ) ) 

X))  ))) 

This  action  law  involves  reflexive  reasoning[Hayes]  because  the  new  state  (DO  A  K)  is  specified  by  a  purported  defi¬ 
nition  which  may  have  0  or  more  solutions.  The  POS  symbol  of  our  modal  logic  Z  was  first  used  in  a  formal  lan¬ 
guage  as  a  hypothesis  of  an  action  law  in[Schwind2].  Our  action  law  however  differs  from  [Schwind  1,2,3]  in  that  the 
states  are  generally  incomplete  propositions  instead  of  worlds,  in  that  the  resulting  state  is  (DO  A  K)  rather  than  be¬ 
ing  merely  some  existentially  quantified  state  of  the  future,  and  in  that  our  law  involves  reflexive  reasoning. 

The  details  of  this  law  are  given  below  along  with  an  example  deduction  illustrating  how  this  law  of  action  auto¬ 
matically  handles  the  frame  problem  by  allowing  properties  which  are  true  in  an  initial  state  to  be  carried  over  into 
the  new  state,  even  though  such  properties  are  never  mentioned  as  being  part  of(or  implied  by)  the  results  of  the 
action  that  is  applied. 

;;;a  restriction  on  the  law  of  action  ••  others  are  possible: 

(EQUALfRESTRICTlON  X)  (EX  G(AND(GENER ATORS  G)(SYN  X(GMEAN1NG  G))))) 

(EQUAL  GENERATORS 

{'(AT  ROBOT  HOME)’(AT  ROBOT  OFFICER  AT  JOHN  HOME)(AT  JOHN  OFFICE)}) 

;;;deflnition  of  commonsense  physics: 

(SYN(PHYSICAL-LA\VS) 

(AND(ALL  X(ALL  P1(ALL  P2(1MPLY(AND(AT  X  P1XAT  X  P2))(EQUAL  Pi  P2))))) 

(NOT(EQUAL  HOME  OFFICE)) )) 

redefinitions  of  the  preconditions  and  effects  of  the  moving  action: 

(SYN(PRECONDlT!ONS(MOVE  ROBOT  PI  P2))(AT  ROBOT  PI)) 

(SYN(RESULTS(MOVE  ROBOT  PI  P2))(AT  ROBOT  P2)) 

;;;an  initial  state: 

(SYN  KSTART(AND(PHYSICA1^LAWS)(AT  JOHN  HOMEX AT  ROBOT  HOME))) 

From  the  above  axioms  it  follows  that  John  stays  at  home  in  the  state  of  the  world  where  the  robot  performs  the 
action  of  going  to  the  office  even  though  this  fact  is  not  mentioned  as  being  a  result  of  the  moving  action: 
(SYN(DO(MOVE  ROBOT  HOME  OFFICE)KST ART) 

(AND(PHYSICAL-LAWS)(AT  ROBOT  OFFICE)(  AT  JOHN  HOME))) 

proof 

^instantiating  the  law  of  action  and  then  simplifying: 

(IMPLY (ENTAIL  KSTART (PRECONDITIONS  (MOVE  ROBOT  HOME  OFFICF)  )  ) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL- LAWS ) 

(RESULTS (MOVE  ROBOT  HOME  OFFICE)) 

(ALL  X(  IMPLY  (AND  (RESTRICTION  X) 

(ENTAIL  KSTART  X) 

(POS (AND  (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) X) )  ) 
X))  ))) 

(IMPLY  (ENTAIL  (AND  (PHYSICAL-LAWS)  (AT  JOHN  HOME)  (AT  ROBOT  HOME))  (AT  ROBOT  HOME)) 
(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL-LAWS) 

(AT  ROBOT  OFFICE) 

(ALL  X( IMPLY  (AND  (EX  G  (AND  (GENERATORS  G)  (SYN  X  (GMEANING  G) ) ) ) 
(ENTAIL  (AND  (PHYSICAL-LAWS) 

(AT  JOHN  HOME) 

(AT  ROBOT  HOME)) 

X) 

(POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) X) ) ) 

X)  )))) 

(SYN  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART) 

(AND (PHYSICAL-LAWS) 
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(AT  ROBOT  OFFICE) 

(ALL  X (IMPLY (AND (OR (SYN  X(AT  ROBOT  HOME)) 

(SYN  X (AT  ROBOT  OFFICE)) 

(SYN  X (AT  JOHN  HOKE)) 

(SYN  X (AT  JOHN  OFFICE))) 

(ENTAIL  (AND  (PHYSICAL-LAWS) 

(AT  JOHN  HOME) 

(AT  ROBOT  HOME))  X) 

(POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  X)  )  ) 
X))  )) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL- LAWS) 

(AT  ROBOT  OFFICE) 

(ALL  X( IMPLY (AND (EQUAL  X(AT  ROBOT  HOME)) 

(ENTAIL  (AND  (PHYSICAL-LAWS) 

(AT  JOHN  HOME) 

(AT  ROBOT  HOME))  X) 

(POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  X) )  ) 
X)) 


(ALL  X( IMPLY (AND (EQUAL  X(AT  ROBOT  OFFICE)) 

(ENTAIL (AND  (PHYSICAL-LAWS) 

(AT  JOHN  HOME) 

(AT  ROBOT  HOME))  X) 

(POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) X) ) ) 
X)) 


(ALL  X(  IMPLY  (AND  (EQUAL  X(AT  JOHN  HOME)) 

(ENTAIL (AND (PHYSICAL-LAWS) 

(AT  JOHN  HOME) 

(AT  ROBOT  HOME))  X) 

(POS  (AND  (DO (MOVE  ROBOT  HOME  OFFICE)  KSTART)  X)  )  ) 
X)) 

(ALL  X(  IMPLY  (AND  (EQUAL  X(AT  JOHN  OFFICE)) 

(ENTAIL  (AND  (PHYSICAL-LAWS) 

(AT  JOHN  HOME) 

(AT  ROBOT  HOME))  X) 

(POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) X) ) ) 
X))  )) 


(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL-LAWS) 

(AT  ROBOT  OFFICE) 

(IMPLY  (AND  (ENTAIL  (AND  (PHYSICAL-LAWS)  (AT  JOHN  HOME)  (AT  ROBOT  HOME)) 
(AT  ROBOT  HOME)) 

(POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  ROBOT 


HOME))’)  ) 

(AT  ROBOT  HOME) ) 

(IMPLY  (AND  (ENTAIL  (AND  (PHYSICAL-LAWS)  (AT  JOHN  HOME)  (AT  ROBOT  HOME)) 

(AT  ROBOT  OFFICE)) 

(POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  ROBOT 

OFFICE) ) ) ) 

(AT  ROBOT  OFFICE)) 

(IMPLY (AND (ENTAIL (AND (PHYSICAL-LAWS)  (AT  JOHN  HOME)  (AT  ROBOT  HOME)) 

(AT  JOHN  HOME)) 

(POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  JOHN  HOME)))) 
(AT  JOHN  HOME) ) 

(IMPLY  (AND  (ENTAIL  (AND  (PHYSICAL-LAWS)  (AT  JOHN  HOME)  (AT  ROBOT  HOME)) 

(AT  JOHN  OFFICE)) 

(POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  JOHN 

OFFICE) ) ) ) 

(AT  JOHN  OFFICE) )  )  ) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 
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(AND (PHYSICAL-LAWS) 

(AT  ROBOT  OFFICE) 

(IMPLY (AND  T 

(POS (AND (DO (MOVE 
(AT  ROBOT  HOME)) 
(IMPLY (AND  NIL 

(POS (AND (DO (MOVE 

OFFICE) ) ) ) 

(AT  ROBOT  OFFICE) ) 
(IMPLY (AND  T 

(POS (AND (DO (MOVE 
(AT  JOHN  HOME) ) 
(IMPLY (AND  NIL 

(POS (AND (DO (MOVE 

OFFICE) ) ) ) 


ROBOT  HOME  OFFICE) KSTART)  (AT  ROBOT  HOME)))) 


ROBOT  HOME  OFFICE)  KSTART)  (AT  ROBOT 


ROBOT  HOME  OFFICE) KSTART)  (AT  JOHN  HOME)))) 


ROBOT  HOME  OFFICE) KSTART)  (AT  JOHN 


(AT  JOHN  OFFICE))  )) 

(SYN  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART) 

(AND (PHYSICAL-LAWS) 

(AT  ROBOT  OFFICE) 

(IMPLY (POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) (AT  ROBOT  HOME))) 
(AT  ROBOT  HOME)) 


T 

(IMPLY (POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) (AT  JOHN  HOME))) 

(AT  JOHN  HOME) ) 

T)) 

(SYN  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART) 

(AND (PHYSICAL-LAWS) 

(AT  ROBOT  OFFICE) 

(IMPLY (POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) (AT  ROBOT  HOME)))) 
NIL) 

(IMPLY (POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) (AT  JOHN  HOME))) 

(AT  JOHN  HOME))) 

(SYN  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART) 

(AND (PHYSICAL-LAWS) 

(AT  ROBOT  OFFICE) 

(NOT (POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) (AT  ROBOT  HOME)))) 

( IMPLY  (POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART)  (AT  JOHN  HOME))) 

(AT  JOHN  HOME)))) 

;;; case  analysis 

(IF  (POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART)  (AT  ROBOT  HOME))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND  (PHYSICAL- LAWS) 

(AT  ROBOT  OFFICE) 

(NOT  T) 

(IMPLY  (POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  JOHN  HOME)))’ 
(AT  JOHN  HOME) )  )  ) 

(IF  (POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  JOHN  HOME))) 

(SYN  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART) 

(AND  (PHYSICAL-LAWS) 

(AT  ROBOT  OFFICE) 

(NOT  NIL) 

(IMPLY  T (AT  JOHN  HOME))  )) 

(SYN  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART) 

(AND (PHYSICAL-LAWS) 

(AT  ROBOT  OFFICE) 

(NOT  NIL) 

(IMPLY  NIL  (AT  JOHN  HOME))  )))) 

(IF  (POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  ROBOT  HOME))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) NIL) 

(IF  (POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  JOHN  HOME))) 
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(SYN  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART) 

(AND (PHYSICAL-LAWS) (AT  ROBOT  OFFICE) (AT  JOHN  HOME))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND  (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)))  )) 

(OR  (AND  (POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  ROBOT  HOME))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) NIL) ) 

(AND  (NOT  (POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  ROBOT  HOME)))) 

(POS (AND (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) (AT  JOHN  HOME))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND  (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME))  )) 

(AND  (NOT  (POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  ROBOT  HOME)))) 

(NOT  (POS  (AND  (DO  (MOVE  ROBOT  HOME  OFFICE)  KSTART)  (AT  JOHN  HOME)))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL- LAWS) (AT  ROBOT  OFFICE))))  ) 

(OR (AND (POS (AND  NIL (AT  ROBOT  HOME))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) NIL)  ) 

(AND  (NOT  (POS  (AND  (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME)  (AT  ROBOT 
HOME)  )  ) 

(POS  (AND  (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME)  (AT  JOHN  HOME)) 
(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND  (PHYSICAL- LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME))  )) 

(AND  (NOT  (POS  (AND  (PHYSICAL- LAWS)  (AT  ROBOT  OFFICE)  (AT  ROBOT  HOME)))) 

(NOT  (POS  (AND  (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  hOME)  ) )  ) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL-LAWS) (AT  ROBOT  OFFICE))))  ) 

(OR (AND (POS  NIL) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) NIL)  ) 

(AND  (NOT  (POS  NIL)) 

(POS (AND (PHYSICAL-LAWS) (AT  ROBOT  OFFICE) (AT  JOHN  HOME)))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME))  )) 

(AND (NOT (POS  NIL)) 

(NOT (POS  (AND (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME)))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL-LAWS) (AT  ROBOT  OFFICE))))  ) 

(OR  (AND  (POS  (AND  (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME)))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND  (PHYSICAL- LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME))  )) 

(AND  (NOT  (POS  (AND  (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME)))) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL-LAWS) (AT  ROBOT  OFFICE))))  ) 

(OR (AND  T (SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME))  )) 

(AND  NIL (SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND (PHYSICAL-LAWS)  (AT  ROBOT  OFFICE))))  ) 

(SYN (DO (MOVE  ROBOT  HOME  OFFICE) KSTART) 

(AND  (PHYSICAL- LAWS)  (AT  ROBOT  OFFICE)  (AT  JOHN  HOME))) 

6.C0NCLUS10N 

Any  scientific  theory  must  be  judged  by  its  coirectness(Does  it  predict  all  the  phenomena  so  far  examined  or  are 
there  counterexamples?),  by  its  experimental  feasibility(ls  it  possible  to  make  predictions  from  the  theory,  or  are  the 
deductions  so  computationally  intractable  that  it  is  practically  impossible  to  determine  the  consequences  of  the  theo¬ 
ry?),  and  by  its  generality(Does  it  apply  to  just  the  current  problem  at  hand  or  does  it  abo  provide  solutions  to  other 
radically  different  problems).  By  these  criteria,  our  theory  of  nonmonotonicity  based  on  the  modal  logic  Z  fairs  ex¬ 
tremely  well.  For,  indeed,  first,  we  have  not  found  any  phenomena  predicted  by  our  theory  which  clashes  with  our 
primitive  intuitions  and  in  fact  even  after  examining  the  example  problems  of  four  other  theories  of  nonmonotoni- 
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city,  we  have  not  found  any  example  therein  described  for  which  our  theory  does  not  give  the  intuitively  correct  re¬ 
sult.  Secondly,  our  theory  of  nonmonotonicity  is  computationally  tractable  in  that  deductions  can  be  made  from  it 
merely  by  deducing  theorems  in  the  modal  quantificational  logic  Z(which  is  monotonic)in  the  traditional  manner  by 
applying  inference  rules  to  axioms  and  previously  deduced  theorems.  Finally,  our  theory  of  nonmonotonicity  which 
is  essentially  nothing  more  than  the  axioms  and  inference  rules  of  the  modal  quantificational  logic  Z  is  a  quite  gener¬ 
al  theory  applicable  to  many  problems.  In  fact,  Z  is  so  general  that  the  modeling  of  non-monotonic  reasoning  played 
no  part  at  all  in  its  original  development  Originally  Z  was  created  to  solve,  in  a  computationally  reasonable  fashion, 
one  technical  problem  in  the  development  of  extendable  automatic  deduction  systems,  namely  to  separate  out  the  no¬ 
tion  of  logical  truth  from  that  of  the  meaning  of  a  sentence! Brown4],  so  as  to  allow  axiom  schemes  to  be  more  easi¬ 
ly  expressed  within  the  formal  language  so  that  they  may  be  proven  and  then  used  as  derived  theorems  in  subsequent 
deductions.[Brown7,8,9].  However,  once  we  had  Z,  we  realized  that  it  could  be  used  to  explicitly  define  a  wide  range 
of  intentional  concepts  [Brown  2,6]  such  as  those  found  in  doxastic  logic,  epistemic  logic,  and  deontic  logic  along 
the  lines  of  the  definitions  given  at  the  end  of  section  2.  We  also  used  Z  to  define  various  features  of  advanced  logic 
programming  languages  in[BrownlO|.  Also,  [Schwind2,3]  showed  that  the  possibility  operator  of  Z  was  helpful  in 
solving  certain  aspects  of  the  frame  problem.  It  is  only  after  all  this,  that  we  have  subsequently  used  Z  to  model 
nonmonotonic  reasoning.  Thus,  we  see  that  the  modal  logic  Z  is  a  quite  general  theory  applicable  to  problems  not 
even  considered  at  the  time  of  its  creation.  This  is  surely  the  mark  of  useful  theory. 
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ABSTRACT .  We  present  a  generalization  of  the  A  search  algorithm  to 
the  case  of  rg^ltiple  objective  problem  solving.  We  call  this 

generalization  A  .  After  some  background  on  search  and  multiobjective 
decision-making,  we  present  ^formulation  of  the  multiobjective  search 
problem  and  an  outline  of  the  A  algorithm.  We  then  develop  some  notation 
and  formally  define  the  concept  of  dominance  as  it  relates  tg 
multiobjective  search  problems.  The  main  results  of  the  paper  are  that  A 
is  complete  on  infinite  graphs  and  admissible  when  used  with  a  suitably 
defined  set  of  admissible  heyjistic  functions.  A  simple  example  is  used  to 
illustrate  the  behavior  of  A  on  a  shortest-path  problem. 

I,  INTRODUCTION ,  Many,  if  not  most,  problem-solving  approaches  may  be 
interpreted  as  procedures  for  iteratively  searching  through  a  set  of 
predetermined  or  progressively  determined  solution  alternatives  until  a 
satisfactory,  perhaps  in  some  sense  optimal,  solution  is  found.  Research  in 
Artificial  Intelligence  (AI)  has  placed  special  emphasis  on  this  view  of 
problem-solving  as  search.  AI  search  procedures  are  typically  formulated 
in  terms  of  a  graph  containing  nodes  representing  the  potential  problem 
solutions .  The  criteria  used  to  guide  these  search  procedures  have 
invariably  been  scalar-valued,  reflecting  the  fact  that  the  problems  to  be 
solved  have  been  formulated  with  only  a  single  objective.  However,  most, 
real-world  problems  involve  multiple,  conflicting,  and  noncommensurate 
objectives  in  any  precise  definition  of  what  constitutes  a  "satisfactory" 
and/or  "optimal"  problem  solution. 

The  task  of  adequately  describing  multiple,  conflicting,  and 
noncommensurate  objectives  using  a  scalar-valued  criterion  has  been  the 
subject  of  considerable  research;  see,  for  example,  (Keeney  and  Raiffa, 
1976)  and  other  texts,  reports,  and  journal  articles  concerned  with 
multiattribute  utility  theory  (MAUT) .  What  often  makes  this  task 
difficult,  and  the  accuracy  of  the  results  suspect,  are  the  potentially 
time  consuming  and  stressful  utility  assessment  procedures  associated  with 
MAUT. 


The  multiobjective  approach  to  problem-solving  avoids,  at  least 
initially,  some  of  the  more  difficult  assessment  issues  associated  with 
MAUT.  In  this  multiobjective  approach,  each  objective  is  modeled  by  a 
scalar-valued  criterion.  However,  instead  of  attempting  to  determine  a 
scalar-valued  function  of  these  criteria  and  then  seeking  an  alternative 
that  optimizes  this  function,  the  goal  is  to  determine  the  set  of 
nondominated  alternatives.  An  alternative  a  is  said  to  be  nondominated 
among  a  set  of  alternatives  if  there  is  no  other  alternative  in  the  set 
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that  is  1)  at  least  as  "good"  as  a  with  respect  to  all  of  the  objectives, 
and  2)  strictly  "better"  than  o  with  respect  to  at  least  one  of  the 
objectives.  Here,  "good"  and  "better"  are  defined  in  terms  of  the  scalar¬ 
valued  criteria  associated  with  individual  objectives.  Thus,  the 
multiobjective  approach  allows  the  alternative  selection  criterion  to  be 
vector-valued  and  does  not  necessarily  determine  the  "most  preferred" 
alternative.  It  does,  however,  identify  all  clearly  inferior  alternatives 
so  that  they  may  be  excluded  from  further  consideration.  The  design, 
application,  and  evaluation  of  procedures  for  determining  the  "most 
preferred"  alternative  from  a  set  of  nondominated  alternatives  is  currently 
a  topic  of  considerable  research  interest;  see,  for  examples,  (White, 
et.al,  1984;  Korhonen,  et.al,  1984). 

The  observations  above  have  motivated  us  to  generalize  the  form  of  the 
criterion  that  directs  search  from  a  scalar-valued  function  to  a  vector¬ 
valued  one.  This  vector-valued  criterion  is  used  to  search  for  the  set  of 
nondominated  solutions  or  solution  paths  rather  than  the  "most  preferred" 
solution  or  golution  path.  Our  initial  efforts  have  focused  on 

generalizing  A  ,  an  important  AI  search  procedure  (Hart,  et.al.,  1968; 
Pearly  1984)  to  the  multiobjectivg  case.  We  refer  to  this  generalization 
as  A  .  The  initial  focus  on  A  is  due  to  its  simple,  yet  powerful 
structure,  which  allows  for  useful  analysis  of  its  performance.  Results  of 
this  initial  work  are  expected  to  provide  valuable  insights  into  the 
general  characteristics  of  multiobjective  search  techniques.  These 
insights  will  guide  future  efforts  to  integrate  other  search  procedures 
with  multiobjective  and  multiattribute  concepts. 

irft 

In  this  paper,  we  define  the  A  algorithm  and  show  that  it  is 

complete  on  Infinite  graphs  and  admissible  with  respect  to  suitably 
extended  def  initions^of  an  admissible  heuristic  evaluation  function. 
Demonstration  that  A  inherits  other  useful  properties  from  A  is  a  topic 
for  future  research. 

'k  irk 

II.  THE  MULTIOBJECTIVE  A  ALGORITHM:  A  ^  We  now  formulate  the 

multiobjective  search  problem,  state  the  A  algorithm,  define  our 
notation,  and  provide  a  detailed  definition  of  the  concept  of  dominance  as 
it  is  used  in  this  paper. 

A.  Multioblective  Search  Problem  Formulation.  The  following  abstraction  of 
the  multiobjective  search  problem  differs  from  the  single-objective  version 
only  in  its  cost  structure  and  solution  definition. 

Given:  -  A  problem  state -space,  representable  as  a  locally - 

finite  directed  graph;  states  are  nodes  in  the  graph 
A  single  start  node  in  the  graph 
A  finite  set  of  goal  nodes  in  the  graph 
A  positive,  vector-valued  cost  associated  with  each  arc 
in  the  graph;  path  costs  are  sums  of  associated  arc 
costs 

A  set  of  vector-valued  heuristic  functions  estimating 
the  cost  to  get  to  a  solution  from  any  node  in  the 
graph 

Find:  •  The  complete  set  of  nondominated  solution  paths  in  the 
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graph  from  the  start  node  to  members  of  the  set  of  goal 
nodes 

As  a  typical  example,  multiobjective  shortest-path  problems  may  be 
formulated  in  this  fashion  (see  also  Section  IV  below) . 

B.  A**  Algorithm  Statement.  Problems  formulated  in  the  above  structure 
may  be  solved  with  A  .  The  algorithm  maintains  three  sets  as  part  of  its 
operation.  OPEN  is  a  set  of  nodes  that  are  in  the  process  of  being 
investigated  by  the  algorithm,  sometimes  called  the  frontier  set.  CLOSED 
consists  of  those  nodes  that  have  already  been  searched  by  the  algorithm. 
OPEN  and  CLOSED  are  both  defined  in  the  same  way  as  for  the  s  ingle  - 
objective  version  of  the  algorithm.  The  third  set,  NONDOMINATED ,  is  needed 
only  for  cases  in  which  multiple  solutions  may  arise.  NONDOMINATED  contains 
the  current  collection  of  nondominated  solution^gaths  and  their  associated 
costs  at  any  point  during  the  operation  of  A  .  The  algorithm  may  be 
summarized  as  follows: 

'k'Jc 

The  A  Algorithm 

1.  Put  the  start  node  in  the  set  OPEN.  Initialize  CLOSED  and 
NONDOMINATED  to  empty. 

2.  If  OPEN  is  empty,  exit  with  the  current  set  of  solution 
paths  in  NONDOMINATED,  if  any. 

3.  Otherwise,  remove  from  OPEN  and  place  on  CLOSED  a  node  n  that  is 
nondominated  among  current  nodes  in  OPEN  and  paths  in  NONDOMINATED. 
If  no  such  node  exists,  exit  with  the  current  set  of  solution  paths  in 
NONDOMINATED,  if  any. 

4.  If  n  is  a  goal  node,  trace  back  through  appropriate  pointers  and 
ado  the  newly  discovered  nondominated  solution  path  or  paths  and  their 
associated  costs  to  the  set  NONDOMINATED.  Go  to  Step  2. 

5.  Otherwise,  expand  n  by  generating  all  of  its  successors  and 
establishing  a  backpointer  to  n  for  each.  For  each  successor  n'  of  n: 

a)  If  n'  €  OPEN  U  CLOSED,  evaluate  its  vector-valued  heuristic 
functions  as  estimates  of  the  cost  to  get  to  a  solution  from  n' , 
add  these  estimates  to  the  vector -valued  costs  accrued  in 
reaching  n*  to  get  estimates  of  the  total  costs  of  solution  paths 
through  n'  ,  and  add  n’  to  OPEN. 

b)  If  n'e  OPEN  U  CLOSED,  redirect  its  backpointers  along  newly 
discovered  nondominated  paths,  if  any. 

c)  If  n'  had  a  new  nondominated  estimate  of  the  cost  of  a 
solution  path  through  n'and  had  been  on  CLOSED,  remove  it  from 
CLOSED,  and  put  it  back  on  OPEN. 

6.  Go  to  Step  2. 

C.  Notation.  The  following  notation  will  be  useful  for  analyzing  the  A 
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algorithm.  To  the  extent  allowed  by  the  different  requirements  of  the 
multiobjective  formulation,  the^notation  below  has  been  defined  in  a  manner 
consistent  with  that  used  for  A  in  (Pearl,  1984)  and  elsewhere. 


G  -  (N,  A)  -  The  state -space  graph  representing  the  problem  to  be 

solved.  N  is  a  countable  set  of  nodes ,  each  of  which 
represents  a  problem  state.  A  C  N  x  N  represents  the  set  of 
directed  arcs  connecting  pairs  of  nodes.  Thus,  if  pair  (n.  ,n.)  is 
in  A,  then  there  exists  a  directed  arc  from  node  i  to  node  j  .  G 

is  assumed  to  be  locally  finite;  i.e.,  the  set  of  Immediate 

successor  nodes  for  each  node  in  N  is  finite. 

s  -  The  unique  start  node  in  N. 

T  C  N  -  The  finite,  nonempty  set  of  goal  nodes ,  with  generic  member  7^. 

T  Q  N  -  The  set  of  nondominated  goal  nodes ;  i.e.,  the  set  of  goal 

nodes  connected  to  s  via  one  or  more  nondominated  paths  in  G. 

P(nt,S)  -  {P(n^,S)J  -  The  set  of  all  finite-length  acyclic  paths  in  G 
connecting  node  i  with  any  member  of  the  set  of  nodes  S. 

'k'jc  ’Arifc1 

P  (n, ,S)  -  {P  (n^,S)}  -  The  set  of  all  nondominated  paths  from  node 

1  to  any  member  of  the  set  of  nodes  S. 

c(ni,n  )  -  {c- (n^ ,n, ) . c^n^.n.)}  -  Vector-valued  arc  cost 

associated  with  the  arc  connecting  nodes  i  and  j  ,  where  M  is  the 
number  of  objectives  under  consideration.  All  these  cost  vectors 
are  assumed  to  be  strictly  positive  and  uniformly  bounded  away 
from  zero  as  specified  by  the  following  constraint: 

cm<ni,nj)  *  &  >  0  V  (n^.nj )  e  A,  me  {1,2, ...  ,M). 

Note  also  that  we  define  c  (n^.n^)  -  0  V  i  and  m,  and  we  assume 
that  the  objective  is  to  minimize  all  cost  components. 

c(P)  -  Actual  vector-valued  path  cost  of  a  specific  path  P,  assumed 
additive;  i.e., 

c(P)  -  2  c(n^,n^)  ,  where  the  sum  is  taken  over  all  pairs  of 
node::  (n^,n^)  representing  arcs  on  path  P. 

C  -  {c  (P)}  -  {c(P) :  P  eP  (S,D)  -  Set  of  all  costs  of 

nondominated  solution  paths  from  s  to  T. 

SG,j,  -  The  traversal  subgraph  defined  at  any  stage  in  the  search  by 

the  pointers  that  A**  assigns  to  the  nodes  already  generated 
(i.e.,  visited  or  searched),  with  the  branches  of  SG_  directed 
opposite  to  the  pointers.  Note  that  if  {n^}  represents  the  set 
of  all  nodes  in  SG_  at  some  point  in  the  search,  then  OPEN  C  {n. ) 
at  that  point  in  the  search  also. 

SG  -  The  explicated  subgraph  of  G,  defined  at  some  point  in  the 

search  as  the  union  of  all  the  branches  in  the  traversal 
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subgraphs  up  to  that  point  in  the  search.  As  Pearl  points  out 
(Pearl,  p.  75),  a  path  P  can  be  in  SG  and  not  in  any  of  the 
traversal  subgraphs  that  make  up  SG  .  Note  that  if  In. ) 
represents  the  set  of  nodes  in  SG  ,  then  at  any  time  during  the 
search  (n^)  -  OPEN  u  CLOSED. 

G(n)  -  (g(n)}  -  Set  of  cost  vectors  for  all  paths  from  s  to  n  that  are 

nondominated  in  SG  .  Note  that  G(s)  -  (0). 

e  ” 

^  "it  "ft 

G  **  (g  )  -  Set  of  nondominated  cost  functions, 

**  M+  **  ,  **  v 

g  :  N  ->  R  ,  where  g  (n)  -  c  (P) 

V  n  e  (nodes  on  P),  P  e  P  (s,n). 

G  (n)  "  ( g  (n) )  -  Set  of  all  costs  of  nondominated  paths  between  the 
start  node,  s,  and  node  n.  By  this  definition, 

G  (n)  -  (c(P) :  P  e  P  (s,n)}. 

H  -  (h)  -  A  set  of  heuristic  functions  estimating  the  cost  to  get  from 
any  node  of  the  graph  to  a  solution, 

M+ 

h:  N  ->  R  ,  where  h(n)  is  an  estimate  o  c(P) 

V  n  €  (nodes  on  P) ,  P  €  P(n,T). 

H(n)  -  (h(n))  -  Set  of  heuristic  function  values  for  a  node  n;  i.e., 
ayA  estimate  of  the  actual  nondominated  cost-to-go  values  in 
H  ^(n).  Note  that 

H(y)  -  (01  V  7  6  r. 

H  ^  -  Set  of  all  actual  cost-to-go  functions, 

**  M+ 

h  N  ->  R ,  where 

'irfc  "ick  'k'k 

h  ^(n)  -  c(P^)  V  n  e  (nodes  on  P^},  P^  e  P  (n,T  ), 
h  ^(n)  -  c(Pj)  V  n  e  (nodes  on  Pj), 

?2  e  P(n,T)  -  P(n,T  ),  and 
h  ^(n)  -  <*>  V  n  «  (nodes  on  P) ,  P  e  P(n,r). 

'k,k  ‘k'k 

H  ^(n)  -  (h  ^(n))  -  Set  of  all  actual  costs  of  nondominated  paths 

between  n  and  T;  i.e.,  between  n  and  a  member  of  the  goal  set.  In 
accordance  with  this  definition  we  assign  values  as  follows: 

h  ^(n)  -  c(P)  V  n  e  (nodes  on  P),  P  e  P  (n,T) 

h  ^(n)  -  »  otherwise. 

H  2  ”  (h  *  ®et  actual  cost-to-go  functions  corresponding  to 
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Che  nondomlnafced  solution  paths  in  G.  Note  that  H  £  H  ^ . 

H  .(n)  -  {h  _(n))  -  Set  of  actual  costs  of  partial  paths  from  n  to  T 
along  nondominated  solution  paths.  Note  that  H**2(n)  £  H**^(n). 

F(n)  -  (f(n)}  -  At  some  stage  of  the  search,  the  set  of  costs  of  the 
form 

f(n)  -  g(n)  +  h(n)  for  some  n,  with  g(n)  6  G(n), 
h(n)  e  H(n) . 

F  (n)  -  {f  (n) }  -  Set  of  costs  of  paths  that  are  nondominated  among 
all  paths  from  s  through  n  to  the  goal  set. 

D.  Definition  of  Path  Cost  Dominance ,  Define  a  relation 

C  P(n^,nj)  x  P(n^,nj)  V  n^,  n^  e  N,  and  V  P^,  P^  e  P(n^,nj)  as  follows: 

(PrPk)  e  «  c(pp  sS  c(Pk). 

We  say  that  path  i  dominates  path  k  if  and  only  if: 

(PrPk)  €  Rx  and  (P^Pp  «  Rr 

Similarly,  under  these  same  conditions  we  say  that  the  cost  of  path  Jt 
dominates  the  cost  of  path  k.  A  path  P  and  its  associated  cost  c(P)  are 
said  to  be  nondominated  (in  P(n^,n.))  if  and  only  if  there  does  not  exist  a 
path  P'  e  P(n.,n.)  3  (P',P)  €  R*.  ^Paths  and  path  costs  may  also  be  defined 
as  nondominated^  with  respect  to  specific  sets  of  paths  and  costs  by 
adapting  the  above  definitions  in  the  obvious  way.  A  node  n  is  said  to  be 
nondominated  with  respect  to  some  set  containing  nodes  and/or  complete 
solution  paths  if  and  only  if  there  is  at  least  one  partial  path  to  n  that 
has  a  solution  path  cost  estimate  that  is  nondominated  with  respect  to  the 
solution  path  costs  or  solution  path  cost  estimates  associated  with  the 
elements  of  the  set. 

** 

III.  FORMAL  PROPERTIES  OF  A  ^  We  first  present  some  preliminary  analysis 
for  the  multiobjective  case.  Then  we  prove  some  useful  results  concerning 
termination  and  completeness  of  the  algor^£hm.  Finally,  we  establish  some 
definitions  that  allow  us  to  show  that  A  is  an  admissible  algorithm  when 
used  with  an  admissible  set  of  heuristics. 


A.  Preliminary  Results.  Using  the  definition  of  a  nondominated  path  cost 
and  the  other  notation  defined  above,  the  following  is  immediate: 


V  n  €  N,  P(s,n)  6  P(s,n),  3  i,  j  e  {1,2 . M)  9 

jctc  "kit 

c1(P(s,n))  g  ^n)  V  g  (n)  6  G  (n)  and 

ifif 

Cj(P(n,7))  h  j  (n)  V  h  ^n)  €  H  ^n),  7  €  T. 
By  the  definition  of  F**(n) ,  we  have: 
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*★  -irk  .  **  **  .  **  . 

V  f  (n)  e  F  (n)  3  g  (n)  e  G  (n)  and  h  ^(n)  6  H  ^(n) 

**  **  _** 
f  -  g  (n)  +  h  . (n) . 


The  notational  definitions  above  also  directly  imply  the  following 
relationship: 


■** .  **. 


**  ** 


irk  irk  irk  irk  irk  irk.  ....  .  — .  — 

C  -  F  (s)  -  H  1(s)  -  H  2<s)  C  G  (7  )  -  F  (7  )  v  7  e  T  . 

H*n  te  a  no'l.-*  on  any  nondominated  path  from  s  to  T;  i.e.,  from  s  to 
some  7  6  T  ,  then 

**  **  **  **  **  **  ** 

3  €  P  (s,n  ),  P2  e  P  (n  ,T  )  b  cCP^)  e  G  (n  )  and 

c(P2)  e  H**1(n**)  with  f**(n**)  -  c(P1)  +  c(P2) 

**  **  ** 
and  f  (n  )  e  C 


A  somewhat  more  concise  way  of  expressing  this  same  relation  using  some 
different  notation  is  as  follows : 

**  **  **  **  ** 

(eq.  1)  V  P  e  P  (s,r  )  3  f  e  F  3  f  (n)  -  c(P) 

'AnAr  'k'fc 

V  n  €  (nodes  on  P],  P  6  P  (s,T  ). 

It  is  tempting  to  try  to  conclude  that 

irk  irk  kic  kic  irk 

V  n  €  (nodes  on  P)  with  P  €  P  (s.T  ),  we  have  F  (n)  Q  C 

The  example  in  Section  IV  below  provides  a  counterexample  showing  that  this 
conclusion  would  be  false.  In  other  words,  £^e  example  shows  that  there 
may  exist  solution  paths  P  througn  some  node  n  such  that  P  is  dominated 
among  all  solution  paths  ,^but  nondominated  among  those  solution  paths 
constrained  to  go  through  n  . 

irk 

The  importance  of  F  is  found  in  its  ability  to  identify  off-track 
nodes;  i.e.,  nodes  not  lying  on  any  nondominated  solution  path.  For  any 
off-track  node  n 

,**  **  **  ** 

f  (n)  I  C  V  f  (n)  €  F  (n) 

icic  irk 

or,  in  other  words,  F  (n)  n  C  -  0  for  all  off-track  nodes  n. 

icic 

B.  Termination  and  Completeness.  The  usefulness  of  A  will  be  at  leagt 
partly  determined  by  its  ability  to  retain  the  valuable  properties  of  A  . 
The  following  resu^£s  on  termination  and  completeness  are  fundamental 
prerequisites  that  A  must  satisfy  to  have  some  chance  of  being  useful. 

In  general,  an  algorithm  is  said  to  complete  if  it  terminates  with 
a  solution  whenever  one  exists.  Since  A  seeks  a  set  of  solutions*.^ the 
nondominated  set) ,  the  definition  must  be  adapted  to  state  that  A  is 
complete  if  it  finds  at  least  one  (nggdominated)  solution  whenever  any 
solutions  exist.  Below  we  show  that  A  terminates  on  finite  graphs  and  is 
complete  on  infinite  graphs.  As  part  of  the  completeness  proof,  we  show 
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that  A  terminates  on  infinite  graphs  as  long  as  some  finite  length  path 
from  the  start  node  to  a  goal  node  exists.  Recall  from  the  problem 
statement  that  we  assumed  that  the  problem  graph  is  locally  finite  and  that 
all  arc  costs  are  uniformly  bounded  away  from  zero.  These  two  assumptions 
are  critical  to  the  proofs  of  the  results  of  this  section. 

Theorem  1.  A  always  terminates  on  finite  graphs. 

Proof.  By  definition,  the  number  of  arcs  in  a  finite  graph  is  finite.  The 
total  number  of  unique  combinations  and  permutations  of  that  finite  set  of 
arcs  is  also  finite.  The  set  of  acyclic  paths  in  the  graph  is  uniquely 
described  by  a  subset  of  the  set  of  all  combinations  and  permutations  of 
the  arcs  in  the  graph.  Therefore,  the  number  of  acyclic  paths  in  a  finite 
graph  is  finite. 

‘fc’fc 

Step  1  of  the  A  algorithm  executes  exactly  once  per  problem.  The 
remaining  steps  form  a  single  loop,  with  either  Step  4  or  Step  5  executing 
on  each  iteration.  Therefore,  if  we  show  that  both  Step  4  and  Step  5  can 
execute  only  a  finite  number  of  times  on  a  given  finite  problem  graph,  then 
we  have  proved  the  desired  result. 

Each  time  the  test  in  Step  4  is  satisfied,  at  least  one  new  finite 
length  nondominated  path  from  s  to  T  has  been  found.  Since  there  are  only 
a  finite  number  of  such  paths,  this  step  can  execute  only  a  finite  number 
of  times. 

Step  5  is  a  node  expansion  step.  Each  time  it  executes,  either  one  or 
more  new  arcs  are  added  to  the  current  traversal  subgraph,  SG^,  or  there 
are  no  other  arcs  out  of  the  node .  If  there  are  no  other  arcs  out  of  the 
node,  then  it  is  a  stub  (only  incoming  arcs  attached  to  it)  and  cannot  be 
on  any  solution  path.  Furthermore,  since  there  can  be  only  a  finite  number 
of  incoming  arcs ,  the  node  will  be  permanently  entered  on  CLOSED  after  a 
finite  number  of  additional  visits.  Since  there  can  be  only  a  finite 
number  of  such  nodes  in  the  problem  graph,  this  can  occur  only  a  finite 
number  of  times. 

Each  arc  that  is  added  to  SG^  represents  part  of  at  least  one  new 
acyclic  path  in  G  that  has  been  dicovered  by  A**.  Since  there  are  only  a 
finite  number  of  such  acyclic  paths  in  G,  this  can  occur  only  a  finite 
number  of  times.  Reopened  nodes  also  represent  new  acyclic  paths  in  G. 
This  is  because  A  only  reopens  a  node  from  CLOSED  when  it  discovers  a 
path  to  the  node  with  a  nondominated  cost  estimate  that  is  different  from 
any  of  those  already  contained  in  SG^.  □ 

Theorem  2^  A  is  complete  on  infinite  graphs. 

Proof.  As  stated  above,  to  show  that  A**  incomplete  in  the  general  case  of 
an  Infinite  graph,  we  must  show  that  A  terminates  with  at  least  one 
(nondominated)  sg^ution  whenever  any  solution  exists.  There  are  only  two 
cases  in  which  A  could  fail  to  terminate  with  a  solution: 

Case  1.  A  terminates  in  failure. 

Case  2.  A  fails  to  terminate. 
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We  will  show  that  neither  of  these  cases  can  occur  if  a  solution  exists. 

Case  1.  When  we  say  A  terminates  in  failure  we  mean  that  the  algorithm 
terminates  before  a  solution  path  is  found.  ^Since  solutions  are  collected 
in  the  set  NONDOMINATED  as  they  are  found, terminates  in  failure  if  and 
only  if  NONDOMINATED  is^empty  when  A  terminates.  There  are  two 
conditions  under  which  A  terminates;  either  OPEN  becomes  empty  or  all 
nodes  on  OPEN  are  dominated  by  costs  of  solution  paths  in  NONDOMINATED. 
Note  that  in  the  second  case,  at  least  one  solution  has  been  found  since 
NONDOMINATED  is  nonempty.  Therefore,  we  only  need  to  show  that,  if  a 
solution  exists,  it  is  not  possible  for  both  OPEN  and  NONDOMINATED  to  be 
empty  at  the  same  time.  To  see  this,  let  P(s,7)  be  a  solution  path. 
OPEN  cannot  become  empty  before  P(s,7)  is  found.  This  is  because  whenever 
OPEN  is  nonempty  and  P(s,7)  has  not  yet  been  found,  OPEN  must  contain  a 
node  on  P(s,7).  We  show  this  by  simple  induction  as  follows: 

At  Step  1,  OPEN  contains  s  which  is  on  P(s,7>. 

-  Assume  that,  after  k  Iterations,  there  remains  at  least  one  node 
from  P(s,7)  in  OPEN.  Let  n  be  the  deepest  such  node. 

On  iteration  k+1,  n  is  either  found  to  be  dominated  or 
nondomlnated  in  OPEN  U  NONDOMINATED.  Unless  n  is  nondominated 
and  selected  for  expansion,  it  is  left  on  OPEN.  If  selected  for 
expansion,  n  is  either  found  to  be  a  goal  node  in  Step  A,  in 
which  case  n  -  7  and  P(s,7>  has  been  found,  or  n  is  expanded  in 
Step  5.  If  n  has  successors  that  are  not  already  on  OPEN  or 
CLOSED,  these  are  added  to  OPEN  in  Step  5a.  If  n  has  no  new 
successors  then  either  n  has  no  successors  or  all  of  its 
successors  are  on  CLOSED.  All  nodes  on  P(s,7>  have  at  least  one 
successor  each  (except  7  and  we  know  at  this  point  that  n  +  7) . 
Therefore,  n  cannot  have  no  successors.  On  the  other  hand,  n 
must  not  have  successors  on  CLOSED  because  by  assumption,  n  is  on 
P(s,7),  so  n  is  at  the  head  of  a  chain  ofdescentent  nodes  among 
which  is  7.  If  any  of  n's  Immediate  successors  were  on  CLOSED, 
then  either  1)  all  of  them  would  be  on  CLOSED,  a  contradiction  to 
the  assumption  that  P(s,7>  has  not  yet  been  found,  or  2)  some  of 
n's  descentants  would  be  on  OPEN,  a  contradiction  to  the 
assumption  that  n  was  the  deepest  node  of  P(s,7>  or  OPEN.  In  any 
case,  at  the  completion  of  Step  5,  node  n  will  have  at  least  one 
decendent  on  OPEN. 

Therefore,  on  iteration  k+1,  P(s,7>  will  either  be  found  or 
it  will  have  one  or  more  nodes  remaining  on  OPEN.  By  complete 
induction,  since  k  was  arbitrary,  OPEN  cannot  become  empty  before 
a  solution  path  P(s,7)  is  found,  if  any  such  solution  paths 
exist. 

Case  2.  In  any  locally  finite  graph,  there  is  only  a  finite  number  of 
finite -length,  acyclic  paths  from  the  start  node  to  any  node  in  the  graph. 
This  can  be  shown  by  induction  as  follows. 

By  the  definition  of  a  locally  finite  graph,  there  can  be  at  most  a 
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finite  number  of  arcs  originating  at  the  start  node.  Therefore,  there  are 
at  most  a  finite  number  of  nodes  at  path  length  1  from  the  start  node. 
Assume  that  there  are  only  a  finite  number  of  distinct,  acyclic  paths  from 
s  to  all  nodes  at  path  length  k-1  and  that  the  number  of  such  nodes  is 
finite.  Since  each  of  these  nodes  may  have  only  a  finite  number  of 
immediate  successors  arrived  at  through  a  finite  number  of  arcs,  the  number 
of  nodes  at  path  length  k  will  also  be  finite.  The  set  of  distinct  acyclic 
paths  to  these  nodes  at  path  length  k  is  uniquely  represented  by  some 
subset  of  the  possible  combinations  of  the  paths  to  nodes  at  path  length  k- 
1  with  the  arcs  from  those  nodes  to  the  nodes  at  path  length  k.  Therefore, 
by  complete  induction  we  have  shown  that  for  arbitrary  finite  k,  there  are 
at  most  a  finite  number  of  distinct  acyclic  paths  from  the  start  node  to 
all  nodes  reachable  by  traversing  k  arcs  of  the  graph,  where  k  is  any 
finite  number. 

kk 

By  Theorem  1,  if  A  does  not  terminate  it  must  be  searching  an 
infinite  path.  Since  all  arc  costs  are  assumed  to  be  uniformly  bounded 
away  from  zero  in  all  components  (c^(a)  £  6  >0  V  i  e  (1,2,...,M),  a  6  A), 
an  infinite  length  path  must  have  unbounded  costs  in  all  elements  of  its 
cost  vector.  Cost  estimates  of  nodes  on  such  an  infinite  path  will 
eventually  become  dominated  by  cost  estimates  of  any  finite  length  ggths, 
including  all  (nondominated)  solution  paths.  This  will  cause  A  to 
eventually  terminate  on  the  test  for  dominance  in  Step  3.  □ 

kk 

Corollary  1^.  A  is  complete  on  finite  graphs. 

Proof.  This  is  a  subcase  of  Theorem  2  for  which  the  proof  in  Theorem  2  is 
still  valid.  □ 

•faff 

Corollary  2.  A  terminates  on  infinite  graphs  if  there  exists  any  finite 
length  path  from  s  to  any  goal  node. 

Proof.  This  was  shown  as  part  of  Case  2  in  Theorem  2.  □ 

C.  Admissibility.  One  of  the  most  useful  properties  of  A  is 
admissibility.  This  property  guarentees  that  the  algorithm  will  return  an 
optimal  solution  whenever  any  solution  exists.  Using  the  following 
definition  of  an  admissive  set  of  multiobjective  heuristics,  analogous 
results  are  derived  for  A 

Definition.  A  multiobjective  heuristic  function,  h,  is  said  to  be 
admissible  if 

kk  "kit  kk 

3  h  2  e  H  j  9  Mn)  £  h  gC*1)  v  n  €  N. 

Definition.  A  set  of  multiobjective  heuristic  functions,  H,  is  said  to  be 
admissible  if 

kk  kk  kk 

Vh  2eH  2  3  h  6  H  3  h(n)  £  h  2<n>  V  n  6  N. 

In  other  words,  a  set  of  admissible  multiobjective  heuristic  functions 
contains  at  least  one  heuristic  faction  that  is  admissible  with  respect  to 
each  of  the  elements  in  the  set  H  _ . 
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In  ord#r  to  prove  the  admissibility  of  A  ,  we  first  consider  the 
following  Lemma  and  Its  Corollary^  both  of  which  presume  the  use  of  an 
admissible  set  of  heuristics  with  A 

icic 

Lemma  1.  At  any  time  before  A  terminates,  for  every  undiscovered 

icic  ieic 

nondominated  solution  path  PeP  (s.T  )  3  n  e  OPEN,  n  e  (nodes  on  P)  3  3 
f (n)  e  F(n)  3  f(n)  s  c(P). 

Pyyof .  AA  Consider  any  undiscovered  nondominated  solution  path  P  e 
P  (s.T  ).  If  no  such  pg£h  exists,  then  there  is  nothing  to  prove.  Let  P 

«■  s.n^.^.n^, . . .  ,n' . 7  .  There  is  always  a  node  from  P  on  OPEN.  This 

is  shown  by'lnduction  as  follows.  At  the  start  of  the  algorithm,  s  is  in 
OPEN.  Assume  that,  at  some  later  point  in  the  search,  a  nongoal  node  of  P 
is  on  OPEN.  Let  n^  .  be  the  deepest  such  node  on  P.  At  the  next  iteration 

of  A**,  n.  .  is  either  chosen  for  expansion  or  not.  If  not,  then  it 

remains  on  OPEN  for  the  next  iteration. 

If  n^  ^  is  chosen  for  expansion,  then  it  will  bo  found  to  have  at 
least  one  successor  on  P,  call  it  n^,  and  this  successor  will  be  added  to 
OPEN  as  n^  ^  is  removed  and  placed  on  CLOSED.  Therefore,  since  k  was 
arbitrary,  completj^induction  shows  that  there  will  always  be  a  node  from  P 
on  OPEN  until  7  is  chosen  for  expansion  and  P  is  discovered  to  be  a 
nondominated  solution  path. 

Since  we  have  shown  that  every  undiscovergg  nondominated  solution  path 
will  have  a  node  on  OPEN  at  all  times  before  A  terminates,  let  n'  be  the 
shallowest  such  node  from  P.  Since  n'  is  the  shallowest  node  from  P  on 
OPEN,  all  of  its  ancestors  must  be  on  CLOSED.  Furthermore,  by  assumption, 
the  partial  path  s.n^.ng.n^ . n'  from  P  is  nondominated.  Therefore, 

icic 

(eq.  2)  V  g(n')  e  G(n')  3  g  <n' )  €  G  (n’>  3  g(n* )  -  g  (n'). 

In  other  notation  this  equation  states  that 

cCs.n^n^nj . n')  -  c(P  (s,n'))  -  g(n') 

icic 

and^gince  G  contains  all  nondominated  costs  for  paths  from  s  to  n' ,  g(n') 
€  G  (n').  Using  the  admissibility  of  H,  we  know  that  for  n' 

**  ific  icic 

Vh  2  €  H  2  3  h  6  H  3  h(n')  h  2(n'). 

icic  icic  icic 

By  the  definition  of  H  since  n'  e  (nodes  on  P)  and  PeP  (s,T  ),  3 

h**2  €  H**2  3  h**2(n')  "  c(P)  -  c(P**(s  ,n' ) ) 

icic  ific  icic 

From  above  we  had  g(n')  -  g  <n')  -  c(P  (s,n'))  for  some  P  (s,n')  on  P 
so  we  can  now  combine  to  see  that 

**  **  .**  ** 

3h  26H  29h  2”  c<p)  '  8  (Ho¬ 

using  equation  1  from  the  preliminary  results  above,  we  know 
**  **  ★* 

3  f  €  F  3  f  (n')  -  c(P) 
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so  that 


3  h 


** 


Combining 

f(n') 


**  ,  **  ,  .  **  ** 

2  6  H  2  3  h  2  n  ^  “  f  <n  >  *  6  <n )• 

the  results  above,  we  may  complete  the  proof  as  follows: 


-  g(n')  +  h(n') 

by  definition 

kk 

-  g  (n')  +  h(n') 

substituting  from  eq.  2 

■irk  kit 

„  ,**  ** 

S  g  (n')  +  h  2(n') 

for  some  h  ^  s  ti  ^ 

by  the  admissibility  of  H 
f**(n')  for  some  f**(n')  e  F**(n') 


by  eq.  1  above 


kk  kk 

-  c(P)  for  some  P  e  P  (s,r  ) 


kk  kk 

by  eq.  1  also,  since  f  (n' )  e  C 

In  summary,  we  have  completed  the  proof  by  demonstrating  the  existence  of 
an  open  node  n'  on  an  arbitrary  nondominated  solution  path  P  such  that 

kk 

f(n')  i  c(P)  for  some  c(P)  €  C  □ 


Cftyollary  3.  Let  n'  be  the  shallowest  open  node  on  a  nondominated  path 
P  (s.n' ')  to  an  arbitrary  node  n* ' ,  not  necessarily  a  goal  node.  Then 

kk  irk  irk 

V  g(n')  €  G(n')  3  g  (n')eG  (n')9g(n')-g  (n') 

kk 

so  that  A  has  already  found  a  nondominated  path  to  n'and  that  path  will 
remain  nondominated  in  P(s,n')  throughout  the  search. 

Proof.  This  proof  is  the  same  as  that  for  Lemma  1  where  the  fact  that  n' 

was  on  a  solution  path  was  not  used  to  demonstrate  the  validity  of  equation 

2.  □ 

kk 

Theorem  3 .  A  using  any  set  of  admissible  heuristics  is  admissible. 

irk 

Proof.  By  Corollary  2  we  know  A  terminates  whenever  a  f^gite  length 

solution  path  exists  so  it  is  sufficient  to  show  that  A  will  not 

terminate  until  all  nondominated  solu^on  paths  have  been  found;  i.e., 
placed  on  NONDOMINATED.  Assume  that  A  terminates,  but  there  exists  some 
nondominated  solution  path  P  that  is  as  yet  undiscovered.  By  Lemma  2 

3  n  e  OPEN,  n  e  (nodes  on  P)  3  3  f(n)  e  F(r.)  9  f(n)  *S  c(P) 

irk 

for  some  c(P)  e  C  . 

kk 

A  terminates  in  only  2  cases: 
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1) 

OPEN  is  empty. 

2) 

All  current 

OPEN  are 

NONDOMINATED. 

partial 

dominated 

path  cost  estimates  tor 

by  solution  path 

nodes 

costs 

on 

i  n 

Case  1  Is  ruled  out  by  our  assumption  that  n  was  in  OPEN.  According  to  the 
condition  in  case  2,  the  termination  we  assumed  must  have  occurred  because 
f(n)  was  dominated  by  a  member  of  N0ND0M1NATED.  This  is+g  contradiction 
since  f(n)  i  c(P)  implies  that  f(n)  is  nondominated  in  C  and  therefore 
nondominated^among  all  solution  paths  that  could  be  in  NONDOMINATED. 
Therefore,  A  cannot  terminate  before  all  nondominated  solution  paths  have 
been  discovered.  □ 

IV.  EXAMPLE .  The  following  gut ti object i vo  shortest  path  problem 
illustrates  the  operation  of  the  A  algorithm  on  a  very  simple  state-space 
graph.  The  purpose  of  the  example  is  to  present  an  overview  of  the  general 
behavior  of  the  algorithm,  without  emphasizing  the  computational  details  of 
the  procedure.  The  computational  aspects  of  the  A  algorithm  in 
particular,  and  of  multiobjective  search  algorithms  in  general,  present  an 
Interesting  topic  for  further  research. 

The  problem  state-space  graph  is  shown  in  Figure  1  ,  which  also  shows 
the  values  of  the  arc  costs  for  the  problem.  The  value  of  M  for  this 
simple  example  is  2.  Based  on  the  information  in  the  figure,  we  may 
calculate  the  derived  cost  information  that  is  shown  in  Table  1.  Also 
shown  in  Table  1  are  the  heuristic  function  values  used  for  this 
illustration.  The  heuristic  function  values  were  simply  defined  by  the 
nondominated  members  of  the  set  of  costs  corresponding  to  the  arcs 
emmlnatlng  from  a  node.  For  example,  the  set  of  costs  of  arcs  for  the 
start  node  is  ( (1 , 2) , (3 , l) , ( 1 , 1) ) ,  of  which  one  is  dominated  so  tin* 
heuristic  function  values  for  the  start  node  arc  1(1, 2). 0,1))  as  shown  in 
Table  1.  These  heuristic  function  values  are  clearly  admissible  since  they 
will  always  provide  lower  bounds  on  the  possible  costs  of  paths  to  the  goal 
set. 


Table  1  provides  the  information  necessary  to  illustrate  one  of  the 
statements  made  in  Section  III.  A.  Nodes  2,  7,  and  7^  all  provide  examples 
of  the  fact  that  the  set  F**2(n)  may  not  be  contained  in  C**,  even  if  n  is 
on  a  nondominated  solution  path.  As  a  specific  example, 

F**2(2)  -  ((7, 9), (9, 5), (8, 8)1, 

but  the  path  (a, 2, 5, 7, 7^}  with  cost  (7,9)  is  not  a  nondominated  solution 
path.  In  other  words,  while  this  path  with  cost  (7,9)  is  nondominated 
among  those  solution  paths  constrained  to  go  through  node  2,  it  is  not 
nondominated  among  all  solution  paths. 

Operation  of  the  algorithm  with  the  next-arc-cost  heuristic  is  easily 
followed  using  the  sequence  of  11  graphs  that  make  up  Figure  ?.  Each 
graph  illustrates  the  state  of  the  search  for  an  iteration  of  the 
algorithm.  Arcs  of  the  initial  problem  state -space  graph  are  shown  as 
dashed  lines.  As  the  arcs  are  traversed  as  part  oi  the  search,  they  are 
made  solid  in  the  figures  and  the  final  solution  paths  are  shown  as 
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additional  numbered  dashed  lines  in  the  graphs.  Nodes  of  the  initial 
problem  state -space  graph  are  shown  as  dashed  circles.  OPEN  nodes  are 
shown  with  double  circles,  the  node  currently  being  expanded  as  part  of  the 
search  is  depicted  with  three  circles,  and  CLOSED  nodes  are  iggicated  with 
solid  circles.  As  shewn  in  the  last  panel  of  Figure  2,  A  correctly 
identifies  the  three  nondominated  solution  paths  in  this  example.  The 
solutions  and  there  associated  costs  are 
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Table  1.  Derived  cost  values  and  next -arc -cost  heuristic 
estimates  for  example  problem. 
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Figure  2.  Operation  of  A**  on  example  probl 


Figure  2.  Operation  of  A**  on  example  problem  (cont. 


Figure  2.  Operation  of  A**  on  example  problem  (coiit. 


V.  CONCLUSIONS .  This  paper  has  presented  some  early  results  in  an  ongoing 
study  of  oultiobjective  search.  We  defined  an  abstract  multiobjectivg 
search  problgig  and  outlined  an  adaptation  of  the  single-objective  A 
algorithm,  A  +for  solving  it.  By  suitably  adapting  the  terminology  used 
in  work  on  A  ,  we  were  able  to  show  that  the+it valuable  properties  of 
completeness  and  admissiblity  are  inherited  by  A  .  The  behavior  of  the 
algorithm  was  briefly  illustrated  on  a  simple  two  objective  shortest  path 
problem. 

There  are  two  basic  directions  for  future  research  in  the  area  of 
multiobjective  search.  On  the  theoretic^  side,  mugh  additional  work  must 
be  done  to  complete  the  development  of  A  .  For  A  ,  heuristic  functions 

may  be  compared  as  to  there  effectiveness  in  directing  search.  In  this 
context,  the  definition  of  dominance  of  one  heuristic  over  another  becomes 
useful  in  identifying  the  best  heuristics.  The  generalization  of  this 
concept  of  dominance  among  heuristics  to  the  multiobjective  case  will  be  a 
useful  theoretical  construct.  Under  specific  assumptions  about  thg 
Information  available  to  aid  in  the  search  process,  it  can  be  shoyy  that  A 
is  an  optimal  search  algorithm  in  some  sense.  The  proof  that  A  is  also 
optimal  in  some  sense,  and  the  definition  of  the  exact  conditions  on  that 
optimality,  are  near-term  research  objectives.  Another  ijem  of  theoretical 
interest  is  thg  multiobjective  generalization  of  the  AO  algorithm,  the 
counterpart  of  A  for  use  in  searching  AND/OR  graphs. 

The  other,  and  perhaps  more  critical,  direction  for  future  research 
concerns  the  practical  uses  of  multiobjective  search.  The  preliminary  work 
done  so  far  indicates  that  these  search  routines  will  be  very  computation 
intensive.  Issues  of  computational  tractibility  and  efficiency  will 
certainly  become  critical  for  any  practical  applications  of  multiobjective 
search.  Actual  heuristics  available  for  real  applications  will  probably  be 
hard  to  develop  and  admissibility  will  almost  certainly  be  impossible  to 
prove  or  guaranty.  Further  work  on  searching  with  inadmissible  heuristics 
will  therefore  be  of  practical  significance. 
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ABSTRACT 

We  propose  an  modal  system  of  algebras  for  knowledge  representa¬ 
tion.  Based  on  this  formal  setting,  we  use  the  combination  of  semantic 
evaluation  and  proof  theoretic  techniques  for  the  design  and  implemen¬ 
tation  of  Expert  Database  Systems. 

We  see  expert  databases  as  dynamic  objects  and  use  a  system  of 
modal  logic  for  the  specification  of  their  dynamic  properties.  The  possible 
worlds  of  our  modal  system  are  instances  of  the  expert  database  system 
which  are  defined  as  many-sorted  algebras.  Thus  our  framework  is  a 
modal  logic  system  of  algebras  where  the  signature  of  algebra  is  the  basis 
for  the  schema  of  the  expert  database.  With  this  setting,  in  addition  to 
ordinary  database  operations,  many  sophisticated  expert  facilities  can  be 
provided. 


1.  INTRODUCTION 

Conventional  database  systems  are  mainly  concerned  with  storage 
and  retrieval  of  data,  and  the  efficiency  of  these  activities.  Generally,  all 
of  the  data  must  be  explicitly  stored  and  there  is  no  mechanism  for 
deriving  new  facts  from  the  existing  information.  In  expert  systems, 
which  are  often  idealized  as  systems  that  can  work  like  human  beings, 
the  emphasis  is  largely  on  the  deduction  of  new  facts,  and  they  are  not 
confined  to  the  data  stored.  In  addition  to  giving  precise  and  complete 
answers  to  questions,  expert  database  systems  (or  inferential  databases) 
should  be  able  to  cope  with  queries  such  as:  "What  would  be  the  conse¬ 
quence  if  X  happens?"  (hypothetical  queries):  "Why  would  X  happen?" 
(causal  queries);  "What  are  the  objects  that  are  directly  or  indirectly 
related  to  a  certain  object?"  (transitive  closure):  "Which  objects-can.be 
candidates  for  solutions  (in  addition  to  the  definite  answers  obtained 
from  the  database)?" 

Thus  the  key  issue  is  to  find  a  suitable  setting  in  which  both  data  and 
knowledge  can  be  stored,  accessed  and  updated.  While  theorem  proving 

*  A  portion  of  the  formal  part  of  this  article  has  been  presented  in  [Goi-86]. 
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techniques  can  provide  the  machinery  to  cope  with  the  above,  we  note 
that,  at  least  with  the  present  technology,  theorem  proving  is  not 
efficient  enough  for  answering  ordinary  database  queries  in  systems  with 
large  amounts  of  data.  In  the  case  of  simple  database  queries,  we  have 
straightforward  computation  where  we  know  which  actions  must  take 
place.  Therefore,  there  is  no  need  for  random  searches  and  tentative  rea¬ 
soning  which  require  a  great  deal  of  computation  time.  To  deal  with  the 
requirements  above  we  propose  the  combination  of  semantic  evaluation 
and  proof  theoretic  techniques  as  tools  for  the  design  of  inferential  data¬ 
bases,  whereby  ordinary  queries  are  computed  straightforwardly,  and 
deduction  is  used  for  more  sophisticated  ones. 

Expert  database  systems  (abbreviated  to  EDS)  are  seen  as  dynamic 
objects,  where  updates  change  the  state  of  the  database  and  the  states 
are  used  for  answering  queries.  For  the  dynamic  part  we  develop  a  modal 
logic  system.  The  domain  of  interpretation  (or  the  universe)  of  a  modal 
system  for  databases  is  the  set  of  database  instances,  and  the  accessibil¬ 
ity  relation  is  determined  by  the  update  functions. 

An  EDS  instance  is  seen  as  a  collection  of  sets  together  with  a  collec¬ 
tion  of  functions  mapping  these  sets  to  each  other.  As  in  abstract  data 
type  specification  methodology,  we  use  the  signature  of  the  algebra  as 
the  basis  for  the  type  checker  and  the  syntax  checker  of  the  database 
language.  Queries  are  expressions  which  are  built  up  out  of  the  symbols 
in  the  signature  and  which  comply  with  the  precise  formation  rules  given 
by  the  query  language.  The  semantics  of  a  query  is  defined  to  be  the  value 
which  is  assigned  to  it  by  the  algebra  representing  a  database  instance. 
Integrity  constraints  are  expressed  as  boolean  valued  expressions  that 
must  hold  in  all  instances. 

The  power  of  deduction  is  provided  by  allowing  inference  rules  which 
are  activated  by  programs  (queries)  or  by  users.  The  inference  rules  are 
also  expressions  of  type  boolean  (like  the  integrity  constraints).  Although 
the  deduction  rules  can  be  invoked  by  the  language  processor,  it  is  possi¬ 
ble  to  define  operators  which  explicitly  trigger  the  inferencing  mechan¬ 
ism.  [Gol-04] 


2.  RELATED  WORK 


2. 1.  The  Algebraic  Approach: 

This  line  of  work  has  its  roots  in  the  research  efforts  on  the 
specification  of  abstract  data  types  [Zil-74  ,  Gut-76  ,  ADJ-77].  Since  a 
conventional  database  (at  a  certain  level  of  abstraction)  can  be  viewed  as 
an  abstract  data  type,  it  has  been  proposed  that  a  natural  application 
area  for  algebraic  specification  theory  is  in  the  formal  specification  of 
databases  This  view  was  taken  by  Ehrig  et  al  [EKW-78]  who  proposed  a 
hierarchic  approach  by  building  tables  and  sequences.  Also  taking  this 
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view,  Wirsing  et  al  [DMW-82]  gave  a  formal  definition  of  databases  and 
introduced  primitives  that  mimic  the  generalization  and  aggregation 
structures  [SmSm-77]. 

It  is  pointed  out  in  [Gol-84]  that  a  severe  drawback  of  these  works  is 
that  they  do  not  cope  with  the  dynamic  aspects  of  databases.  We  elim¬ 
inate  this  shortcoming  by  introducing  our  modal  system. 

Finally,  we  point  out  the  novel  extensions  of  our  approach.  We  give 
the  following  extensions  to  the  ordinary  notion  of  universal  algebra: 

we  add  "variable  binding  operators"  such  as  the  set-construction 
operator  and  the  quantifiers,  to  the  collection  of  operators; 

and  we  allow  the  operators  in  the  algebra  to  work  on  union,  cartesian 
product  and  powerset  of  types. 


2.2.  The  Logical  Approach: 

Three  directions  can  be  distinguished  here.  First  and  the  most  prom¬ 
inent  appraoch  is  using  Prolog  as  the  language  for  all  purposes  (which,  as 
mentioned,  is  inadequate  for  large  databases)  [HaSe-84].  The  second 
approach  is  interfacing  a  Prolog  system  with  a  relational  database  system 
which  will  have  special  problems  due  to  the  addition  of  a  communication 
system.  (For  example,  the  authors  of  [VCJ-83]  note  the  limitations  of 
using  Prolog  directly  as  a  system  and  interface  it  with  a  relational  data¬ 
base.  Some  other  variations  are  discussed  in  [PCG-96].  The  third 
approach  is  to  consider  a  richer  framework  that  can  cater  for  a  wider 
variety  of  tasks.  Our  approach  falls  within  this  category. 

Our  modal  logic  system  is  similar  to  Hoare-style  logics  [Gold-82].  The 
motivation  for  its  introduction  came  from  our  realization  that  the  NEXT 
operator  of  temporal  logic  cannot  adequately  reason  about  the  next  EDS 
state  because  it  assumes  a  fixed  sequence  of  states.  Note  that  from  any 
state,  depending  on  the  update  to  be  performed,  there  are  many  states 
that  we  can  go  to.  (That  is,  "NEXT  instance"  will  depend  on  the  update.) 
Our  modal  operators  are  like  the  NEXT  operator  of  temporal  logic  but  are 
parameterized  with  respect  to  the  update  being  performed. 


2.3.  The  Artificial  Intelligence  Approach 

A  survey  and  tutorial  on  expert  systems  is  presented  in  [HWL-03]. 
In  this  book,  the  authors  describe  many  concepts  which  are  important 
in  the  design  and  construction  of  expert  systems,  and  then  analyze  a 
number  of  existing  expert  systems,  including  MYCIN  [Sho-76]  and  its 
derivatives,  DENDRAL  and  Meta-DENDRAL  [BuFe-70],  SAINT  [Sla-61]  and 
its  successors,  HEARSAY  [Erm-80]  and  its  other  versions. 

A  quick  study  of  these  systems  shows  that  none  have  a  sound  formal 
foundation  and  that  generally  they  use  ad  hoc  methods  for  finding  solu¬ 
tions  within  large  search  spaces. 
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3.  FORMAL  SPECIFICATION  OF  EXPERT  DATABASE  SYSTEMS 

The  framework  will  be  constructed  in  two  steps.  First,  the  static  part 
will  be  developed  and  then  we  design  the  dynamic  part  on  top  of  it. 

We  consider  four  classes  of  symbols  within  a  given  alphabet.  We 
assume  that  the  appearance  of  each  symbol  will  determine  which  class  it 
belongs  to.  The  four  classes  are:  sort  symbols,  function  symbols,  opera¬ 
tion  symbols,  and  variables.  We  also  assume  that  the  two  sorts  boolean 
and  integer  are  given. 


3.1.  Primitives  of  EDS 

We  define  inductively  simple -type-expressions  to  be: 
o  sort  symbols 
o  or  of  one  of  the  forms: 
o  7iU72. 

O  (7i*7z*  •••?„) 

o  and  P(7i), 

where  for  some  n  for  1  js  i  <  n,  is  a  simple-type-expression.  We  will  see 
that  P(A)  will  be  interpreted  as  the  powerset  of  the  set  A. 

Given  n  to  be  a  natural  number,  a  function-type-expression  of  arity  n 
has  the  form 

7i.7a.  •  .7n  "*  7n+l 

where  for  1  ss  i  s  n  +  i,  is  a  simple-type  expression.  Operation-type- 
expressions  are  defined  in  a  similar  manner.  For  example,  the  operation 
type  expression  for  the  operation  symbol "+"  is 

inf,  inf  -*  inf. 

A  signature  is  a  function  which  assigns  a  function-type-expression  to 
each  function  symbol  and  a  sort  symbol  to  each  variable  symbol.  Thus, 
variables,  both  local  and  global,  are  typed  (sorted)  by  the  signature  and 
not  by  the  user.  There  is  an  unlimited  supply  of  variables  of  each  type. 
The  signature  is  thus  the  specification  for  the  type-checker  and  the 
syntax-checker  of  the  language. 

Given  a  signature  2  and  a  function  symbol  y  in  the  domain  of  2 ,  the 
arity  of  p  in  2  is  the  arity  of  2(?). 

Fvftmplp-  A  small  portion  of  an  EDS  for  aviation  purposes  can  be  specified 
by  giving  the  necessary  sort  symbols,  function  symbols  and  the  function 
type  expressions  for  each  of  the  function  symbols.  Some  of  the  sort  sym¬ 
bols  are  'flights’,  'aircrafts’,  ’bases’  and  'staff.  Some  function  symbols 
are  ’destination_Df’,  ’flighL_given_io’,  ’captain_of’  and  ’crew_of.  Here  we 
present  the  unique  type-expressions  for  some  of  these  functions. 
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captain_of  flights  -»  staff 
crew_of  flights  -*  P(staff) 
flight_given_to  aircrafts  -»  flights 
destination_of  flights  -*  bases 
is_suitable_for  aircrafts  ,  flights  -*  boolean 

For  a  signature  2,  we  define  the  set  of  well-formed  expressions  on  2 
in  the  usual  way.  Details  of  this  section  were  presented  in  [Gol-B3]. 
Amongst  the  operations  provided,  there  are  a  number  of  variable  binding 
operators  such  as  the  logical  quantifiers  and  the  set  construction  opera¬ 
tor.  Bound  and  free  occurrences  of  variables  in  expressions  can  be 
detected  syntactically  in  the  usual  way. 

Given  a  signature  2,  static  integrity  constraints  on  2  are  defined  to 
be  a  collection  of  well-formed  expressions  of  type  boolean  on  2.  We  will 
use  Tr  for  a  set  of  integrity  constraints  on  a  signature  2. 

Fvftmplp  of  a  static  integrity  constraint  on  our  avionic  EDS  is:  (Variables 
are  written  in  capital  letters.) 

"Ages  of  all  crew  members  must  be  greater  than  18" 

forall  C  (  age_of(C)  GT  10  ) 

Given  a  signature  2,  inference  rules  on  2  (denoted  by  *1)  are  defined 
as  closed  expression  of  type  boolean  on  2.  Note  that  the  formal 
definitions  of  integrity  constraints  and  inference  rules  are  the  same  and 
the  only  distinction  is  in  their  designation.  An  example  of  an  inference 
rule  on  the  avionic  EDS  is:  "Aircrafts  with  seating  capacity  less  than  10  fly 
in  an  altitude  of  less  than  25000". 

forall  AIRCRAFT  (capacity_of(AIRCRAFT)  LT  10)  implies 

(altitude_of(flight_given_io(AIRCRAFT))  LT  25000) 

An  EDS  schema  is  the  triple  (2,  rE,  *E)  where  2  is  a  signature,  Ij  is  a 
(possibly  empty)  set  of  constraints  on  2,  and  *E  is  a  (possibly  empty)  set 
of  inference  rules  on  2,  such  that  r’lU'h:  is  consistent.  It  is  interesting  to 
note  that  if  *E  is  empty,  then  we  have  an  ordinary  database  system. 

Obviously  one  of  the  most  important  features  of  any  system  is  the 
language  provided  by  it.  We  will  see  in  the  following  section  that  the  query 
language  constructed  based  on  this  formalism,  despite  the  mathematical 
rigour,  has  a  very  simple  notation.  Although  we  cannot  talk  about 
hypothetical  queries  (because  they  are  expressions  of  our  modal  system) 
here  we  will  present  other  types  queries  which  demonstrate  the  power  of 
the  proposed  language.  (Hypothetical  queries  will  be  introduced  after 
discussing  our  modal  system.) 

For  a  given  signature  2,  a  query  is  a  closed  expression  on  2  in  which 
any  variable  is  bound  only  once.  As  examples,  we  construct  a  few  queries 
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of  different  types.  The  first  query  is  an  ordinary  database  query. 

1.  Who  is  the  captain  of  the  flight  to  which  aircraft  AR008522  is 
assigned? 

captain_jof(flight_given_to(AR008522)) 

In  the  evaluation  of  this  query,  the  result  returned  by  the  iunction 
’flight-given_to’  will  be  given  to  ’captain_of'.  This  is  simple  function  com¬ 
position. 

2.  Destinations  of  all  flights  that  take  off  from  base  AAA  before  the  hour 
1400. 

f  (F,destination_of(F))  | 

(origin_of(F)  is  AAA)  and  (departure_lime_of(F)  LT  1400)  {  F 

While  the  variable  F  iterates  over  the  elements  of  the  set  ’flights’,  a 
set  is  constructed  that  contains  flight  numbers  and  destinations  of  all 
those  flights  for  which  the  said  conditions  hold.  Obviously,  F  is  a  variable 
of  type  'flights’.  The  appearance  of  F  on  the  very  right  indicates  the  vari¬ 
able  which  is  being  bound  by  the  set  construction  operator. 

3.  To  demonstrate  the  capability  for  dealing  with  queries  that  involve 
computation  of  transitive  closure,  we  formulate  a  query  from  a  medi¬ 
cal  expert  database  system. 

Cures  of  all  those  diseases  that  can  be  caught  as  a  result  of  having  an 
ulcer. 

This  query  will  require  the  computation  of  transitive  closure  of  the  set- 
valued  function  "results_of_having".  (Proofs  on  least  fixed  point  and  ter¬ 
mination  are  presented  in  [Gol-83]). 

{  (Dl,  cures-jof(Dl))  |  D1  isin  S  J  D1 
where 

S  =  results_oLhaving  (ulcer)  union 

Union  f  results_of_having(D2)  |  D2  isin  S  {  D2 

"union"  is  the  ordinary  set  theoretical  operator  U-  "Union"  is  the  opera¬ 
tor  which,  when  given  a  set  of  sets,  computes  the  union  of  all  included 
sets.  In  the  inductively  defined  part  we  have  results_oLhaving(ulcer)  as 
the  basis. 


3.2.  Semantics  of  the  language 

Having  discussed  the  syntax  of  the  language,  we  use  the  notions  of 
algebra  to  give  semantics  to  the  formalisms.  Informally  speaking,  a  many 
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sorted  algebra  is  a  function  which  assigns  a  set  (a  carrier)  to  every  sort 
symbol  and  a  function  to  every  function  symbol. 

Recall  that  we  allowed  several  forms  to  be  simple-type-expressions. 
For  a  simple-type-expression  7,  the  set  of  all  objects  of  type  y  in  an  alge¬ 
bra  A,  denoted  by  |A|y  is  defined  as  follows: 

i-  when  7  is  a  sort  symbol  then  |A|7  =  Afr),  that  is  the  set  that  the  alge¬ 
bra  A  assigns  to  it. 

ii-  when  7  is  7iU?z  then  IAJ,  =  |A)7lu!A|78 

iii-  when  7  is  (7»*7a*  ■  •  •  *7n)  then  |A|7  =  \fy*  •  •  •  *|A|7n 

iv-  when  7  is  P(7t)  then  |A|7  =  P(|A!7l),  that  is,  the  powerset. 

The  evaluation  in  A  of  expressions  is  carried  out  in  the  usual  way. 
Give«  an  EDS  schema  S  =  (E.rr,  *2)  where  E  is  a  signature,  and  Is  and  *2 
are  as  before,  an  algebra  A  is  an  5-algebra  iff: 

1.  For  every  function  symbol  <p  in  the  domain  of  A,  if  2(p)  is 

7i,7z . 7n  -*  7„*i  then  A (<p)  returns  an  element  of  |A|7ii+1  when  given  an 

element  of  |A|7j,  an  element  of  (AJ^.  •  •  •  ,  and  an  element  of  |A|7n. 

2.  The  evaluation  in  A  of  all  of  the  expressions  in  rE  results  in  true. 

Let  5  =  (E.rEl  *1)  be  an  EDS  schema.  An  EDS  instance  is  the  ordered 
pair  (5, A)  where  A  is  an  5-algebra. 

Given  an  expression  f)  of  type  boolean,  for  an  EDS  instance  i,  we  write 
i  |=  0  iff  i  evaluates  0  as  true.  We  will  use  /  to  indicate  the  collection  of 
EDS  instances  on  a  given  schema. 


3.3.  The  Dynamic  Aspects  of  EDS 

The  modal  logic  system  that  we  will  use  for  reasoning  about  dynamic 
characteristics  of  the  EDS  is  comparable  with  temporal  logic.  It  was  first 
presented  in  [GMS-83].  Syntactically,  we  make  a  number  of  extensions.  E 
is  extended  to  E'  by  including: 

o  update  symbols  u0,itj, 

0  for  each  of  the  update  symbols  u0,uj,  •  •  •  ,  we  introduce  a 
corresponding  modal  operator  [u0].[u|],  •  •  • , 

o  global  variable  symbols  XU,  XJ,  ■  •  •  of  each  sort  7  of  E. 

The  set  of  well-formed  expressions  over  E  is  extended  to  well-formed 
expressions  over  E'  by  allowing  the  construct  [ix]Q  as  an  expression  of 
type  boolean,  where  fl  is  of  type  boolean  and  n  is  an  update  symbol.  The 
expression  \jm] fl  is  read  as:  "after  the  update  n  is  performed  n  will  be 
true".  Note  that  each  of  the  operators  [/u]  acts  as  an  operator  in  a  similar 
way  to  the  more  familiar  modal  operators  ALWAYS,  NEXT,  etc.  In  fact,  one 
can  think  of  the  [/x]  as  the  O  operator  of  temporal  logic  which  is 
parameterized  with  respect  to  the  update  being  made. 
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Given  n,  0!  and  Cl2  as  expressions  of  type  boolean  on  E'  and  X  as  a  glo¬ 
bal  variable,  we  can  extend  our  logic  for  deriving  consequences  by  adding 
the  following  axioms  and  rule: 

Axl.  |-  [mK^i  -» n2)  E  (OPi  "*  M^i) 

Ax2.  |—  -[ju]n  =  M-n 

Ax3.  |—  for  all  X  [/Lt]n(A')  =  [/*]  for  all  X  0(X) 

(X  must  be  a  global  variable.) 

Rul.  If  j-  Q,  -*  n2  and  |-  Oi,  then  |-  f)2. 

Ru2.  If  |-  fl.  then  |-  [/u]0. 

The  semantics  for  the  modal  extensions  is  defined  as  follows.  For 
every  update  symbol  n  in  E'  we  consider  a  function  p  which  when  given  an 
EDS  instance  returns  an  EDS  instance,  i.e. 

where  /  is  the  set  of  EDS  instances.  Recall  that  we  used  i  |=  0  to 
indicate  that  fl  holds  in  the  instance  i.  We  extend  our  notion  of  satisfac¬ 
tion  to  cope  with  modal  expressions.  Given  an  update  symbol  /x  and  the 
corresponding  update  function  we  have: 

t  |=  [/x] fl  iff  £(i)  |=  Cl. 

Other  modal  operators  are  not  essential  for  our  purposes  but  we  can 
easily  capture  them  if  necessary.  By  adding  the  ''null"  update  to  our  sys¬ 
tem  we  will  get  a  modal  system  equivalent  to  S4. 

Transition  constraints  are  boolean  type  expressions  over  E'.  (Transi¬ 
tion  constraints  are  statements  that  guard  the  system  through  updates.) 
For  example,  the  constraint  "ages  cannot  be  reduced"  is  expressed  as  fol¬ 
lows: 

forall  X  forall  Y  ((age_of(X)  is  Y)  implies  ([u]  (age_of(X)  GE  Y ))) 

This  expression  reads  as  follows:  for  any  person  X  and  any  age  Y,  if  the 
age  of  X  is  Y  then  after  performing  any  update  u  the  age  of  X  will  be  at 
least  Y. 


3.4.  Hypothetical  Queries 

Using  this  type  of  query,  the  user  asks  the  expert  system  to  make 
predictions  based  on  certain  assumptions.  For  example,  in  a  company, 
the  manager  may  ask  the  question:  "if  I  increase  Jack’s  salary  by  1,000 
dollars,  would  he  then  earn  more  than  George?"  Such  a  query  is 
expressed  as: 

[increase_salary(Jack,  1,000)]  saLof(Jack)  GT  saLof(George) 
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In  this  query  the  system  will  not  make  the  update  effective  but  will 
assume  that  the  update  is  performed  and  then  answers  the  question. 

Similarly,  suppose  the  management  decides  to  ensure  that  no 
employee  shall  earn  less  than  20,000  dollars  and  they  want  to  know 
whether  a  uniform  10%  pay  raise  would  achieve  this.  Such  a  query  is  for¬ 
mulated  as: 

forall  EMP  ([increase_salary(EMP,  sal_of(EMP)/10)]  saLof(EMP)  GT  20000) 

This  expression  reads  as  follows:  "For  every  employee,  after  increasing 
the  salary  of  that  employee  by  10%,  is  it  true  that  her  salary  will  be  more 
than  20,000  dollars?". 


4.  CONCLUSION 

Research  into  the  mathematical  foundations  of  intelligent  systems  is 
of  increasing  importance  for  a  healthy  growth  of  computing  technology. 
Database  systems  (with  different  degrees  of  sophistication)  and  expert 
systems  are  at  the  heart  of  a  great  deal  of  work  on  information  process¬ 
ing  systems,  software  engineering  development  environments  and  on 
many  branches  of  artificial  intelligence.  We  note  that  the  progress  has 
been  slow  for  systematic  software  design  methodologies.  The  well- 
recognized  "software  crisis"  is  a  symptom  of  the  limitations  inherent  in 
the  current  traditional  approach  to  the  specification,  design  and  pro¬ 
gramming  of  complex  systems.  In  recent  years,  these  problems  have 
been  becoming  steadily  more  apparent.  They  will  be  the  dominant  limit¬ 
ing  factor  in  our  ability  to  apply  ever  more  powerful  computing  hardware 
to  solve  complex  problems.  Ideally,  the  current  "brute  force"  methods 
and  ad  hoc  design  will  be  replaced  by  sound  formally-based  techniques. 

Here,  we  have  developed  a  formal  setting  for  the  specification  of 
intelligent  systems.  Based  on  this  setting,  we  were  able  to  define  a  func¬ 
tional  query  language  powerful  enough  to  do  several  novel  things.  The 
notation  of  this  language  is  based  on  the  well-known  conventional 
mathematical  notation,  similar  to  SETL  and  SASL. 
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Toward  optimal  feature  selection: 

Past,  Present  and  Future. 

Wojciech  Siedlecki  and  Jack  Sklansky 
University  of  California,  Irvine 
Irvine,  CA  92717 

ABSTRACT.  Over  the  last  twenty-five  years,  extensive  research  has  taken  place  on  the  development 
of  efficient  and  reliable  methods  for  the  selection  of  features  in  the  design  of  pattern  classifiers,  where 
the  features  constitute  the  inputs  to  the  classifier.  The  quality  of  this  design  depends  on  the  relevancy, 
discriminatory  power  and  ease  of  computation  of  various  features. 

Selecting  features  is  an  extremely  difficult  task,  charged  both  with  theoretical  and  computational  prob¬ 
lems.  An  effective  mathematical  theory  for  feature  selection  seems  achievable  only  for  a  very  specific  aspect 
of  the  problem:  linear  transformations  for  reducing  the  dimensionality  of  the  feature  space,  with  the  assump¬ 
tion  that  data  are  drawn  from  normal  distributions  [l,2,3,4j.  The  theoretical  problems  are  usually  associated 
with  two  closely  related  questions: 

a)  “What  does  it  mean  that  a  feature  is  good  or  irrelevant?” 

b)  “What  criteria  should  be  used  to  evaluate  features?”. 

From  the  standpoint  of  Bayesian  decision  rules  there  are  no  bad  features.  One  never  can  improve  the 
performance  (usually  understood  as  an  error  committed  by  the  classifier)  of  a  Bayes  classifier  by  eliminating 
a  feature  (this  property  is  called  monotonicity).  However,  in  practice  the  assumptions  in  the  design  of  Bayes 
classifiers  are  (almost)  never  valid.  As  a  consequence,  it  is  possible  to  improve  the  performance  of  a  nonideal 
classifier  by  deleting  a  feature  (this  phenomenon  will  be  discussed  later).  Moreover,  for  a  given  amount  of 
data,  reducing  the  number  of  features  increases  the  accuracy  of  estimates  of  the  classifier’s  performance. 
These  two  facts  have  tremendous  consequences  for  computational  problems  associated  with  feature  selection 
and  have  led  in  the  past  to  other  methods  for  evaluating  features  |5,6,7|.  These  methods  do  not  evaluate 
the  performance  of  a  classifier  associated  with  a  given  set  of  features,  but  rather  tend  to  approximate  the 
Bayes  error  for  this  set  of  features.  The  criteria  used  by  these  methods  (for  instance  Bhattacharyya  distance 
or  Vajda’s  entropy)  satisfy  the  monotonicity  property,  which  permits  the  use  of  efficient  computational 
techniques.  However,  some  evidence  [8|  indicates  that  they  do  not  induce  over  an  arbitrary  set  of  features 
the  same  preference  order  as  would  be  obtained  by  comparing  the  errors  of  the  Bayes  classifier.  Thus,  it 
seems  that  the  only  promising  and  legitimate  way  of  evaluating  features  must  be  through  the  error  rate  of 
the  classifier  being  designed  (this  also  satisfies  our  intuitive  understanding  of  the  design  policy,  although  it 
has  some  theoretical  drawbacks  (5, 9]). 

Unfortunately,  so  far  none  of  the  forms  of  classifiers  realisable  in  practice  by  known  techniques  exhibits 
the  monotonicity  property.  This  fact  is  important  when  we  realise  that  the  problem  of  feature  selection  is 
essentially  equivalent  to  searching  a  directed  graph  (at  the  root  node  all  features  are  accepted)  and  could  be 
solved  by  artificial  intelligence  or  *AI”  (e.g.  branch  and  bound  [6])  techniques.  Moreover,  the  total  number 
of  all  possible  subsets  of  an  n-element  set  of  features  totals  around  2n  and,  therefore,  even  for  small  n  (say, 
10)  any  brute  force  method  leads  to  a  computational  dead  end  (specially  when  the  evaluation  of  classifier’s 
error  is  costly). 

Over  the  last  five  years,  intensive  research  on  feature  selection  has  been  carried  out  at  University  of 
California,  Irvine  [10|,  leading  to  a  group  of  suboptimal  but  efficient  and  robust  methods.  This  group 
includes: 

1.  methods  utilising  the  idea  of  approximate  monotonicity  (10), 

2.  other  AI  methods  for  graph  searching. 

Another  promising  method,  currently  under  consideration,  is  based  on  the  observation  that  the  monotonicity 
property  of  classifier’s  error  rate  is  highly  related  to  the  optimality  of  this  classifier  [ll|.  This  method  does 
not  require  any  search,  but  evaluates  all  features  at  the  same  time  in  a  fussy  decision  process  that  involves 
the  assignment  of  a  weight  to  each  feature.  In  this  report  we  will  discuss  the  above  three  classes  of  methods 
in  more  detail. 
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1.  THE  PAST.  A  HISTORICAL  NOTE.  The  pioneering  work  in  the  area  of  feature  selection  ii  associ¬ 
at'd  with  the  names  of  Sebestyen  j  12),  Lewis  ( 13 j  and  Marill  and  Green  (14),  who  made  their  contributions  in 
the  early  sixties.  Since  at  that  time  the  theoretical  framework  for  evaluating  the  error  rate  of  classifiers  was 
also  in  its  preliminary  stage  of  development,  the  original  approaches  to  feature  selection  were  based  on  the 
concept  of  probabilistic  class  separability  measures  and  entropies.  In  some  cases  (e.g.  [13])  the  independence 
of  features  was  assumed  and  the  features  were  selected  on  the  basis  of  their  individual  merits.  However, 
even  such  a  simplified  model  did  not  guarantee  the  optimality  of  a  selected  feature  subset  (for  instance,  two 
independent  features  don’t  have  to  be  the  two  best,  as  was  pointed  out  by  Cover  I  IS]). 

The  question  of  the  trade-off  be¬ 
tween  the  optimality  and  efficiency 
of  algorithms  for  NP-problems  (fea¬ 
ture  selection,  by  definition,  seems  to 
qualify  as  an  NP-problem)  was  rec¬ 
ognized  early,  and  the  mainstream  of 
research  on  feature  selection  was  thus 
directed  toward  suboptimal  search 
methods.  The  invention  of  sequen¬ 
tial  backward  selection  (SBS)  in  1963 
(14]  gave  rise  to  a  family  of  subopti¬ 
mal  stepwise  forward  and  backward 
methods.  The  research  in  this  di¬ 
rection  was  concluded  by  introduc¬ 
ing  the  generalization  of  these  algo¬ 
rithms  proposed  by  Killer  in  197? 

[  16|.  Another  approach  to  feature 
selection  based  on  the  concept  of 
dynamic  programming  was  proposed 
by  Chang  ( 17],  but  this  approach 
is  burdened  by  numerous  restrictive 
requirements  (e.g.  the  monotonic¬ 
ity  condition  and  statistical  indepen¬ 
dence  of  features)  and,  therefore,  has 
not  been  heavily  pursued  by  other 
researchers. 

The  potential  of  any  suboptimal 
search  algorithm  to  select  the  worst 
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possible  set  of  features  was  indicated  by  Cover  and  Campenhout  [18].  A  breakthrough  came  in  1977  with 
the  introduction  of  the  branch  and  bound  algorithm.  The  application  of  this  method,  proposed  by  Narendra 
and  Fukunaga  [6],  guaranteed  the  selection  of  an  optimal  feature  subset  if  the  monotonicity  condition  is 
satisfied.  The  monotonicity  condition  requires  that  a  criterion  function  J  used  to  evaluate  feature  subsets 
change  (in  our  case:  grow)  monotonically  over  a  sequence  of  nested  feature  subsets  [Fi, . .  .,/*},  that  is 


Fi  C  C  ...  C  fie  =*  J(Fi)>/(Fa)>...>  J(Ffc).  (1) 

Based  on  this  concept  Narendra  and  Fukunaga  also  clearly  defined  which  subset  of  features  could  not  be 
considered  optimal.  Roughly  speaking  the  branch  and  bound  procedure  searches  in  an  optimally  organized 
way  the  feature  selection  lattice,  Fig.l.  (In  the  lattice,  nodes  represent  feature  subsets  and  links  represent 
the  relation  of  subset  inclusion.  The  subsets  are  coded  by  sequences  of  zeros  and  ones.  One  means  that  a 
feature  is  present  in  a  subset  and  zero  means  that  the  feature  does  not  belong  to  it.  The  percentages  next 
to  the  nodes  denote  observed  error  rates  of  a  hypothetical  classifier.) 

Since  the  kind  of  graph  generated  in  the  feature  selection  problem  (each  node  represents  a  subset  of 
features)  has  finite  depth,  the  depth  first  search  technique  appeared  very  effective  in  this  case  and  has 
resulted  in  a  very  efficient  enumeration  scheme. 

W  hen  no  restrictions  on  examining  nodes  (feature  subsets)  in  the  graph  have  been  assumed,  the  branch 
and  bound  leads  to  exhaustive  search.  However,  if  each  nodr  is  evaluated  with  the  aid  of  a  criterion  function 
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J i  and  an  upper  limit  (threshold)  for  its  acceptable  values  is  set  (that  is,  some  feature  subsets  ate  considered 
infeasible),  then  the  algorithm  backtracks  whenever  an  infeasible  node  is  discovered.  If  the  criterion  function 
has  the  monotonicity  property  (1),  no  feasible  node  is  omitted  as  a  result  of  early  backtracking  and,  therefore, 
the  gained  savings  in  the  search  time  do  not  violate  the  “optimality”  of  the  selection  procedure.  Now,  among 
all  examined  and,  therefore,  feasible  subsets  of  features  one  can  look  for  the  best  group  of  features  according 
to  a  second  criterion  Jj.  If  Ji  is  also  monotonic  with  respect  to  a  sequence  of  nested  feature  subsets,  but  in 
the  direction  opposite  to  that  of  J t,  then  J i  and  J2  can  be  interchanged,  yielding  a  search  for  a  feasible  node 
among  the  best  nodes,  which  is  equivalent  to  a  backward  branch  and  bound  scheme  (in  which  one  takes  the 
empty  set  of  features  as  a  start  node). 

All  the  considerations  regarding  the  branch  and  bound  procedure  as  applied  to  optimal  feature  se¬ 
lection  are  valid  only  for  monotonic  evaluation  functions.  Narendra  and  Fukunaga  originally  proposed  to 
use  probabilistic  separability  measures  as  criterion  functions.  This  approach,  however,  has  a  number  of 
disadvantages: 

a)  feature  selection  is  done  based  on  finite  samples  and  should  refer  to  any  particular  classifier’s  performance 
rather  than  to  intrinsic  discriminant  properties  of  the  data,  which  cannot  be  reliably  uncovered  due  to 
the  sampling  process, 

b)  to  use  any  of  these  criteria  one  has  to  estimate  them  based  on  the  sample,  which  introduces  some  error 
and  can  turn  a  monotonic  criterion  into  a  nonmonotonic  one,  and 

c)  as  some  evidence  indicates  [8]  certain  criteria  in  this  class  can  give  results  which  are  optimal  in  the  sense 
outlined  above  but  the  selected  subset  of  features  need  not  be  optimal,  where  the  optimality  refers  to 
expected  performance  of  a  classifier  which  would  use  this  subset  of  features. 

While  the  second  disadvantage  can  prevent  the  search  procedure  from  finding  an  optimal  solution,  the  first 
and  the  third  ones  are  much  stronger  when  a  selected  subset  of  features  is  to  be  used  to  build  a  practical 
classifier. 

As  many  authors  pointed  out,  the  only  remaining  alternative  is  to  use  the  error  rate  of  a  classifier  as 
a  design  criterion.  Unfortunately,  due  to  phenomena  similar  in  origin  to  those  mentioned  in  the  second  of 
the  above  disadvantages,  the  error  rate  of  a  classifier  (if  it  is  not  a  Bayes  classifier)  does  not  satisfy  the 
monotonicity  condition.  Such  a  case  can  be  observed  in  the  feature  selection  lattice  presented  in  Fig.l. 
The  error  rate  along  the  path  (1111)— (1101)-(  1001)— ( 1000)  has  a  monotonicity  defect  at  the  node  (1001). 
So  far  the  lack  of  monotonicity  in  the  classifier’s  error  rate  made  it  useless  for  the  branch  and  bound 
procedure.  In  1985  Foroutan  and  Sklansky  [10]  introduced  the  concept  of  approximate  monotonkity.  Based 
on  the  example  of  a  locally  trained  piecewise  linear  classifier  they  showed  that  the  error  rate  might  be  used 
for  optimal  branch  and  bound.  Although  the  supporting  tests  were  done  only  for  one  data  set  the  idea 
of  approximate  monotonicity  constitutes  another  breaktrough  in  understanding  and  applying  methods  for 
optimal  feature  selection  for  classifiers  trained  on  finite  samples. 

2.  THE  PRESENT:  APPROXIMATE  MONOTONICITY  AND  A1  GRAPH  SEARCH  METHODS. 
The  concept  of  approximate  monotonicity  opened  a  new  chapter  in  the  research  011  optimal  feature  selection. 
It  allows  the  use  of  branch  and  bound  to  obtain  with  high  confidence  an  optimal  subset  of  features  even 
though  the  monotonicity  condition  is  in  some  cases  to  some  extent  violated.  Below  we  discuss  two  ways 
of  coping  with  the  negative  effects  of  the  lack  of  monotonicity  in  the  error  rate  on  the  optimal  branch 
and  bound  search  procedure.  Also  we  present  other  approaches  to  feature  lattice  search,  originating  from 
artificial  intelligence  (AI). 

2.1.  THE  BRANCH  AND  BOUND  PROCEDURE  FOR  NONMONOTONIC  CRITERIA.  In  their 
work  ( 10)  Foroutan  and  Sklansky  used  a  tolerance  factor  imposed  on  the  assumed  threshold  for  branch  and 
bound  search.  Namely,  if  the  assumed  upper  limit  of  the  error  rate  for  a  subset  of  features  to  be  considered 
feasible  is  emax ,  then  a  subset  of  features,  F,  in  fact  is  assumed 

a)  feasible,  if  the  associated  error  rate  e(F)  is  less  or  equal  to  emux, 

b)  conditionally  feasible,  if  emi|I  <  '(?)<  ttnax  (1  +  A)  and 

c)  infeasible,  if  e(F)  (1  +  A). 

In  this  version  the  best  subset  of  features  is  chosen  only  from  the  set  of  feasible  nodes  in  the  feature  selection 
lattice,  but  also  conditionally  feasible  nodes  are  examined.  In  Fig. 2.  a  feature  subset  (00110101)  is  found 
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conditionally  feasible  and  March  if  continued.  This  allow*  (he  branch  and  bound  algorithm  to  examine 

the  feature  fubfet  behind  it,  which  if  feasible.  In 

experiments  described  in  |10j,  despite  the  lack  of  error  rate 

strict  monotonicity,  a  procedure  using  the  error 

tolerance  was  able  to  find  an  optimal  subset  of 

features  with  over  90%  in  computational  savings  |tl 

compared  with  exhaustive  search. 

IK  a. 

Another  way  of  avoiding  the  negative  effects  AV 

of  using  an  estimated  and,  therefore,  by  defini-  122 - _ ttofilPU  —  — - -r—  — 

tion  nonmonotonic  error  rate  of  a  classifier  is  to  /  \ 

estimate  the  expected  value  of  she  error  rate  for  112  / 

an  examined  subset  of  features.  Assuming  that  iox 

..rtures  from  monotonicity  are  an  effect  of  et-  ^  — — • 1  *' 

timation  errors  and  the  classifier’s  true  error  rate  ** 

should  increase  monotonically  over  a  sequence  of  t ,  - 

nested  feature  subsets  (1)  we  can  try  to  estimate  aubsets 

the  general  trend  of  error  rate  changes  in  the  prac-  ’  "  "  * 

tica!  classifier.  We  consider  the  error  rate  a  func-  _  ©  S  S  8  8 

tion  of  nested  feature  subsets,  which  corresponds  H  S  E  S  E  5 

to  a  path  in  the  feature  selection  graph.  Since  the  ~  £  £  2  S 

observed  error  rate  may  not  be  monotonic  we  can  **  •“  *“  “  •“  .  ** 

observe  that  along  this  path  it  rises  and  falls  even 

though  the  expected  error  rate  does  not  decrease.  Mg.2. 

As  a  result,  it  might  happen  that  the  current  node  is  infeasible  based  on  its  observed  error  rate,  but  that  it 
ought  to  be  feasible  because  the  expected  error  associated  with  a  classifier  trained  on  an  infinite  sample  is 
below  the  threshold  of  acceptability.  In  Fig.3.  the  feature  subset  (00110101)  would  be  considered  infeeible 
based  on  the  observed  error  rate  associated  with  it.  However,  as  one  can  notice  the  trend  along  this  path 
(in  this  case  we  use  linear  prediction)  indicates  that  the  value  of  the  expected  error  rate  associated  with  this 
subset  should  be  less  than  the  presumed  threshold  and,  therefore,  the  subset  is  treated  as  if  it  were  feasible. 
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Thus,  if  we  analyte  the  trend  of  changes  of 
the  observed  error  rate  over  a  sequence  of  nested 
feature  subsets  and,  based  on  this  information,  we 
use  the  approximation  of  the  expected  error  rate 
rather  than  the  currently  observed  error  rate,  we 
may  successfully  use  the  branch  and  bound  algo¬ 
rithm  to  search  for  the  optimal  subset  of  features. 

The  strategy  of  enumeration  in  the  branch 
and  bound  method  is  another  important  factor  in¬ 
fluencing  the  efficiency  and  optimality  of  the  fea¬ 
ture  selection  process.  When  the  monotonicity 
condition  is  satisfied  it  does  not  matter  in  which 
order  we  will  examine  the  descendants  of  the  cur¬ 
rent  node  —  we  will  always  find  the  optimum  solu¬ 
tion  and  the  number  of  visited  nodes  in  the  feature 
selection  lattice  will  be  about  the  same  in  each 
case.  In  this  case  one  usually  takes  the  node  with 
the  highest  error  rate  as  the  next  current  node, 
for  it  increases  the  chance  for  finding  the  next  in¬ 
feasible  node  and  consequently  for  pruning  some 
part  of  the  lattice  below  it.  However,  when  the 
nionotonicity  condition  is  not  satisfied  by  using 
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this  strategy  we  could  prune  a  part  of  the  lattice  including  feasible  nodes  and,  which  if  likely,  the  best  node. 
Such  a  case  is  observed  in  the  feature  selection  lattice  depicted  in  the  Fig.l.  Hen,  it  the  threshold  is  set 
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to  15%  the  optimal  node  (0001)  will  be  pruned  due  to  the  fart  that  it  it  a  sublet  of  the  ret  (1001),  which 
ir  infeasible.  On  the  other  hand,  if  we  select  a  subset  with  the  lowest  error  as  the  nex  node  in  the  lattice 
this  subset  (0001)  will  be  discovered  in  the  sequence  (1101 )— (0101 )— (0001 ) .  Hence,  the  strategy  in  which 
we  choose  the  node  with  the  lowest  error  rate  is  more  likely  to  avoid  local  nonmonotonicities  and  continue 
search  in  those  parts  of  the  feature  selection  lattice  that  would  be  skipped  by  using  the  traditional  strategy. 

An  important  question  in  feature  selection 
with  the  aid  of  branch  and  bound  is  how  to  choose 
the  threshold  that  defines  the  feasibility  of  subsets 
of  features.  We  can  assume  that  we  do  not  want 
a  big  degradation  of  the  classifier's  performance 
and,  therefore,  we  set  the  threshold  at  a  low  level. 

However,  we  don’t  know  what  price  we  will  pay  for 
selecting  an  optimal  subset  of  features.  In  other 
words,  we  do  not  know  in  advance  if  the  cost  of  re- 
moving  one  feature  from  the  selected  optimal  sub* 
set  would  only  minimally  increase  the  value  of  the 
error  rate  or  whether  by  adding  one  feature  we  can 
significantly  improve  the  classifier’s  performance. 

This  might  be  important,  since  by  properly  setting 
the  threshold  we  could  avoid  an  examination  of  a 
significant  number  of  nodes  in  the  feature  selection 
lattice. 

A  way  to  predict  the  best  value  of  the  thresh¬ 
old  would  be  to  use  a  sequential  forward  or  back¬ 
ward  method  to  look  for  the  best  path  in  the  fea¬ 
ture  selection  lattice,  where  the  best  path  is  a  path 
along  which  the  error  rate  increases  as  slowly  as 
possible.  By  doing  this  we  can  scan  the  feature  se¬ 
lection  lattice  and  obtain  a  function,  Fig. 4.,  depicting  the  trade-off  between  the  number  of  removed  features 
and  an  expected  threshold,  which  must  be  set  in  order  to  find  an  optimal  subset  of  features  of  this  size.  Or, 
on  the  other  hand,  we  can  estimate  the  size  of  the  optimal  feature  subset  given  some  threshold. 

Unfortunately,  both  forward  and  backward  selection  can  easily  be  derailed.  For  instance,  the  forward 
selection  algorithm  can  add  two  features  which  are  subsequently  the  best  ones  but  they  are  bad  if  used 
together.  This  could  be  to  some  extent  avoided  if  the  sequential  forward  and  backward  methods  are  used 
at  the  same  time.  We  call  this  method  a  bidirectional  search.  Its  concept  originates  from  the  MEA  (i.e. 
means-ends- analysis  method  used  for  problem  solving  in  AI. 

In  the  bidirectional  search  we  conduct  the  search  for  the  best  path  from  two  end  nodes  (that  is,  the 
node  representing  the  full  set  of  features  and  the  node  associated  with  the  empty  set)  at  the  same  time.  The 
feature  selection  lattice  is  examined  in  a  DFS  ( depth  first  search)  fashion,  in  two  directions: 

a)  from  the  full  set  node  toward  the  empty  set  node  and 

b)  from  the  empty  set  node  toward  the  full  set  node. 

The  search  is  conducted  simultaneously  from  both  terminal  nodes  and  concludes  in  the  middle  of  the  lattice, 
resulting  in  a  path  that  goes  from  the  top  to  the  bottom  of  the  feature  selection  lattice.  The  path  is 
determined  a  by  local  comparison  of  values  of  the  criterion  J  associated  with  the  feature  subset  evaluation. 
At  every  step,  in  the  forward  as  well  as  in  the  backward  search,  for  the.  current  feature  subset  all  its  successor 
nodes  (its  subsets  in  the  forward  direction  and  supersets  in  the  backward  direction)  are  evaluated  and  the 
most  promising  ones  are  selected.  If  there  ir  a  conflict,  then  the  second  best  successors  are  selected.  A 
conflict  arises  if  at  a  given  step  the  same  feature  is  selected  in  both  directions,  that  is,  is  chosen  to  be 
both  added  and  discarded.  This  corresponds  to  the  situation  in  the  sequential  forward  selection  algorithm 
mentioned  above:  a  feature  is  considered  good  by  the  forward  selection  method  but  the  backward  selection 
algorithm  indicates  that  it  also  could  be  removed  with  no  harm.  In  other  words,  the  conflict  suggests 
that  the  information  obtained  front  the  two  methods  is  contradictory  and  should  be  disregarded.  If  this 
conflict  were  not  resolved,  it  would  be  impossible  to  conclude  both  searches  in  the  same  place  in  the  feature 
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;el*ct>on  lattice  for  no  path  connecting  the  two 
cui  ent  node;  contain;  both  node;  determined  by 
adding  and  removing  the  aame  feature  at  the  same 
time. 

An  example  result  of  using  the  bidirectional 
search  procedure  is  given  in  Fig.5.  This  compari¬ 
son  was  made  for  the  feature  selection  lattice  ob¬ 
tained  from  a  piecewise  linear  classifier  trained  on 
a  synthetic  data  set  with  known  properties.  The 
analysis  of  the  lattice  suggest;  that  the  error  rate 
is  nonmonotonic  (in  fact  the  data  contained  six 
d<  liberately  inserted  irrelevant  features).  As  one 
can  see,  the  resulting  error  rates  along  the  path 
selected  by  the  bidirectional  search  seem  to  follow 
closely  the  minimum  error  rates  obtained  from  ex¬ 
haustive  analysis.  This  encourages  the  use  of  the 
bidirectional  search  algorithm  to  predict  the  value 
of  threshold  for  efficient  branch  and  bound  search. 

Moreover,  the  bidirectional  search  is  insensitive  to 
li  e  menctonicity  of  the  error  rate  function. 

There  if  one  additional  benefit  of  scanning 
tne  feature  selection  lattice  for  the  best  path:  we 
can  estimate  the  number  of  nodes  that  have  to  be 
examined. 

l.l.  OTHER  A1  METHODS  IX  FEATURE  SELECTION.  Both  branch  and  bound  and  ME  A  are  com¬ 
monly  recognized  as  techniques  developed  by  AI  researchers.  While  the  branch  and  bound  algorithm  can 
supply  the  optimum  solution,  provided  that  some  conditions  be  satisfied,  the  other  AI  techniques  are  typ¬ 
ically  heuristic  and  they  generally  do  not  give  a  guarantee  of  finding  the  optimal  solution.  However,  when 
the  original  dimensionality  d  of  feature  space  is  large  (say  d  >  IS)  then  the  optimality  must  be  given  up 
because  the  complexity  of  the  problem  impedes  the  use  of  the  branch  and  bound  technique.  Of  course,  one 
could  start  enumerating  nodes  in  the  feature  selection  lattice  in  the  backward  fashion,  that  is  from  the  node 
associated  with  the  empty  set.  However,  if  the  selected  threshold  allows  the  algorithm  to  visit  nodes  too  deep 
in  the  lattice,  this  solution  would  be  as  useless  as  the  original  version  of  the  branch  and  bound  enumeration 
scheme.  In  such  a  case  we  have  to  look  for  substitute  solutions,  which  are  most  likely  nonoptimal. 

The  branch  and  bound  technique  in  application  to  the  search  in  the  feature  selection  lattice  is  nothing 
hut  a  method  of  enumerating  nodes  in  this  graph.  Its  advantage  over  other  possible  enumeration  schemes  is 
such  that  no  node  is  examined  more  than  once  and,  therefore,  by  forcing  the  algorithm  to  backtrack  earlier 
than  at  the  terminal  node  corresponding  to  the  empty  set,  we  cm  eliminate  parts  of  the  lattice,  which  for 
some  reason  (in  our  case  the  nodes  with  excessive  error  rates)  are  of  no  interest  to  us,  ud  increase  the 
efficiency  of  the  search. 

Other  known  AI  techniques  do  not  have  this  property.  They  guarantee  the  optimum  solution  in  feature 
selection  only  if  they  are  allowed  to  do  exhaustive  search.  The  following  are  a  few  examples: 

a)  OFS,  depth  first  search,  which,  if  terminated  without  backracking,  turns  out  to  be  tke  sequential  forward 

or  backward  selection, 

b)  3FC,  breadth  first  search,  which  has  no  equivalent  in  feature  selection  literature  Md 

c)  bes -first  search,  which  also  has  no  equivalent. 

In  (he  best-first  search  method  one  expands  the  top  node  and  builds  from  its  descendants  a  queue  according 
to  decreasing  values  of  the  to  called  heuristic  evaluation  function  associated  with  them.  Next,  the  first  node 
in  tiie  queue  is  expanded  and  the  queue  updated.  This  process  is  repeated  until  a  goal  node  is  detected.  la 
the  feature  selection  problem  we  do  not  explicitly  look  for  a  goal  node,  because  we  are  unable  to  detect 
whether  a  given  node  is  a  goal  or  not.  Instead,  we  are  interested  in  searching  some  part  of  the  feature 
selection  lattice,  which  should  contain  the  node  that  is  optimal  with  regard  to  some  assumed  criterion. 
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The  heuristic  evaluation  function  would  be  anol&er  unknown  in  the  definition  of  the  best-first  search 
method.  For  instance,  one  could  assume  that  the  observed  error  rate  of  a  classifier  is  this  function. 

The  major  disadvantage  of  the  best-first  search  algorithm  is  its  space  complexity  (i.e.,  the  size  of  the 
computer  memory  required  to  execute  an  algorithm),  which  in  the  worst  case,  when  all  nodes  from  the 
middle  level  in  the  lattice  have  to  be  placed  in  the  queue,  equals 


where  d  is  the  total  number  of  features.  This  number  is  prohibitive  even  for  d  as  small  as  20,  so  the  full 
queue  cannot  be  stored  in  computer's  memory.  However,  we  can  assume  in  advance  the  maximum  size  of 
the  queue  and  this  version  of  the  best-first  search  procedure  is  refered  to  as  beam  search  technique.  For 
instance,  only  the  feasible  subsets  can  be  stored  in  the  queue  (although  this  does  not  give  a  full  guarantee 
that  the  queue  will  be  limited  to  a  reasonable  size). 

The  limitation  of  the  size  of  the  queue  has  two  consequences.  First,  the  space  complexity  is  much  less 
and  can  be  arbitrarily  set.  Second,  some  nodes  and,  as  a  result,  some  parts  of  the  feature  selection  lattice 
are  never  visited,  which  significantly  improves  the  efficiency  of  the  beam  search  compared  to  the  exhaustive 
search  scheme  pursued  by  the  best-first  search  algorithm.  The  second  observation  suggests  also  that  the 
beam  search  procedure  may  not  find  the  optimal  solution  (unless  all  feasible  nodes  are  stored  in  the  queue 
and  the  erTor  rate  has  the  monotonicity  property). 

Recently  we  have  conducted  experiments  with  a  version  of  the  beam  search  procedure  which  incorporates 
into  the  heuristic  evaluation  function  not  only  the  error  rate  associated  with  the  current  node  but  also  uses 
some  prediction  scheme  to  speed  up  the  search  process.  Its  algorithm  contains  the  same  elements  as  the 
original  beam  search.  First  the  top  node  is  expanded  and  the  priority  queue  built  according  to  increasing 
values  of  the  error  rate  associated  with  each  descendant.  Next,  from  a  few  levels  ahead  some  assumed 
number  of  nodes  is  drawn  at  random  from  the  lattice.  If  there  is  a  node  with  the  error  rate  lower  than  the 
error  rate  of  the  first  node  in  the  queue,  then  as  the  current  node  we  choose  the  best  node  in  the  queue  from 
which  there  is  a  path  toward  the  node  on  the  lower  level.  Otherwise,  we  select  the  first  node  in  the  queue 
as  the  best  node.  Finally,  we  expand  the  best  node  and  the  process  of  generating  goal  nodes  (these  nodes 
are  drawn  only  from  levels  below  the  level  at  which  the  current  best  node  is  placed)  and  for  selection  of  the 
next  best  node  is  repeated.  The  natural  stop  condition  is  satisfied  if  all  prospective  nodes  are  infeasible  with 
regard  to  a  given  threshold. 

The  drawback  to  this  method  is  that,  for  a  high  dimensionality  of  the  feature  space,  the  number  of 
nodes  checked  may  become  very  large,  and  the  stop  condition  may  not  be  reached  soon  enough.  This  could 
be  resolved  in  one  of  the  following  ways: 

a)  by  setting  the  number  of  nodes  checked  to  a  finite  number, 

b)  by  making  the  stop  condition  user  interactive  or 

c)  by  using  the  two  options  given  above. 

We  have  tested  this  algorithm  on  the  data  used  for  the  bidirectional  search.  The  results  are  very  encouraging: 
for  each  data  set  this  method  performed  as  well  as  the  branch  and  bound  algorithm,  but  the  number  of 
examined  nodes  was  much  less.  However,  we  emphasise  that  the  beam  search  algorithm  is  suboptimal,  and 
for  this  reason  it  is  not  competitive  with  branch  and  bound  wherever  the  latter  method  can  be  used.  On 
the  other  hand,  its  usefulness  can  be  appreciated  in  feature  selection  problems  in  which  the  dimensionality 
of  the  feature  space  prohibits  the  use  of  the  branch  and  bound  enumeration  scheme. 

Another  interesting  aspect  of  the  beam  search  technique  is  that  it  can  be  viewed  as  a  generalization  of 
the  popular  sequential  selection  methods.  Namely,  if  the  queue  size  is  assumed  to  be  equal  to  one  and  we 
start  searching  from  the  node  associated  with  the  full  set  of  features,  then  this  algorithm  is  equivalent  to 
the  sequential  backward  selection  method. 

3.  THE  FUTURE.  The  techniques  for  feature  selection  discussed  so  far  assumed  a  search  for  the  best 
subset  of  features  among  a  number  of  feasible  subsets.  Such  a  statement  of  the  problem  presents  several 
disadvantages: 

a)  It  leads  to  an  NP-problem,  which  for  larger  tasks  must  be  solved  with  the  aid  of  suboptimal  methods. 

These  methods,  by  definition,  do  not  guarantee  that  the  selected  subset  is  optimal, 
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b)  Given  a  selected  optimal  subset  of  features  we  are  still  unable  to  determine  the  usefulness  of  a  particular 
feature  (called  sometimes  its  discriminatory  power). 

c)  if  a  feature  selection  process  uses  a  criterion  function  involving  an  error  rate  of  a  classifier,  then  it 
•mist  be  immediately  recognized  as  a  process  in  which  this  classifier  is  optimized  and,  therefore,  trained. 
Hence  the  only  error  rate  that  can  be  computed  for  this  classifier  is  an  apparent  error  rate,  which  is 
known  to  be  very  biased 

it  ‘s  very  likely  that  a  panaceum  for  ail  these  problems  does  not  exist,  although  we  can  try  to  solve  each 
>f  them  independently.  For  instance,  instead  of  selecting  a  subset  of  features  we  can  evaluate  a  discriminatory 
power  of  each  feature.  A  potentially  promising  approach  is  based  on  the  concept  of  classifiers  optimized  with 
regard  to  the  use  of  available  features  [  1 1] .  In  this  approach  we  define  a  classifier  as  a  function  F:  X  x  P  — ►  f), 
vlieic  A  s  a  feature  space,  P  is  a  set  of  parameters  of  the  classifier  and  fl  is  a  set  of  class  labels  (decisions). 
We  assume  that  our  classifier  is  trainable  with  respect  to  a  criterion  function  J:  P  — 1 ►  SR,  where  SR  is  a  set  of 
rial  numbers,  that  is,  we  can  find  a  parameter  vector  p‘  €  P  such  that  J{p‘)  is  a  minimum.  In  this  notation 
a  classifier  F{  ,p‘)  is  assumed  to  be  an  optimally  trained  classifier.  Now  suppose  we  discard  the  t-th  feature, 
that  is  for  any  two  feature  vectors 

x' =  [xi,...,x',...,i,i|T  and  x{  f  x[  but  F(x,p)  =  F(x',p). 

'/e  cal  a  classifier  a  scalable  classifier  if  the  effect  described  above  can  be  accomplished  by  imposing  a 
certain  value  to  the  parameter  vector,  p.  in  fact,  many  known  classifiers,  including  linear,  piecewise  linear 
a.id  quadratic  ones  are  scalable  classifiers  Other  types  of  classifiers  like  k-NN  rule  or  classifiers  based  on 
densi  y  function  estimation,  which  involve  the  use  of  distance  functions,  can  be  transformed  to  satisfy  the 
cefinition  of  scalable  classifiers. 

in  1 1  we  have  shown  that  if  a  classifier  is  a  scalable  classifier  then  its  error  rate  satisfies  the  monotonicity 
condition  provided  that  we  use  an  optimum  training  procedure  to  minimize  the  error  rate  of  the  classifier, 
t  his  theorem  can  oe  rephrased  for  a  linear  classifier  into  the  following  form:  if  a  linear  classifier  is  trained 
with  the  aid  of  a  procedure  that  guarantees  a  minimum  resubstitution  error  rate  then  this  error  rate  is 
mcnotonic  over  a  sequence  of  nested  feature  subsets.  With  some  additional  assumptions  a  similar  theorem 
was  proven  for  piecewise  linear  classifiers  |10|.  Now,  if  we  optimally  train  a  scalable  classifier,  then  we  will 
receive  a  vector  of  optimal  parameters,  p*.  Since  some  of  these  parameters  are  responsible  for  amplifying  or 
reducing  the  influence  of  each  feature  (for  instance,  weights  in  linear  classifiers),  then  by  comparing  them  we 
can  deduce  the  discriminatory  power  of  each  feature  (assuming  that  all  features  are  statistically  equivalent, 
that  is,  they  have  the  same  mean  and  variance). 

The  following  approach  can  be  called  a  fuzzy  formulation  of  the  feature  selection  problem: 

"  ivei>  a  scalable  classifier,  train  it  optimally  over  a  set  of  statistically  equivalent  features  and  compare 
parameters  associated  with  each  feature.  These  parameters  can  be  viewed  as  values  of  a  fuzzy  membership 
function  computed  for  associated  features  and  their  relatively  large  values  indicate  high  discriminatory  power 
of  these  features.” 

Unfortunately,  the  optimal  training  procedures  are  not  known  so  far. 

The  approach  sketched  above  may  solve  the  first  two  problems.  However,  the  problem  of  biasedness  of 
ii  error  rate  of  a  classifier  built  for  a  selected  optimal  subset  of  features  is  more  complicated.  We  could 
try  to  use  a  form  of  cross-validation  for  feature  selection.  Given  a  finite  sample  we  divide  it  into  two  parts: 
t!  *  training  set  and  the  test  set.  Next,  we  design  a  classifier  and  perform  feature  selection  for  this  classifier 
based  on  its  observed  error  rate.  Finally,  we  compute  a  new  error  estimate  for  the  classifier  whose  design  is 
b-sed  c. n  the  selected  optimal  subset  of  features.  This  procedure  is  repeated  a  number  of  times,  and  each 
time  v  e  divide  the  data  set  into  two  subsets  in  a  different  way.  At  the  end  of  the  process  we  take  a  mean  of 
alt  estimated  final  error  rates.  This  estimator  is  known  as  a  rotation  error  estimator,  and  is  less  biased  than 
the  ’•(■substitution  error  estimator.  However,  this  solution  has  one  significant  drawback:  it  requires  that  the 
feature  'election  process  be  repeated  a  number  of  times,  which  further  increases  the  already  discouraging 
computational  complexity  of  the  problem. 
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Abstract.  The  problem  of  finding  low-cost  testing  procedures  to  isolate  a  faulty,  or 
diseased,  object  has  been  extensively  studied.  We  enlarge  the  problem  by  allowing  treatment  of 
the  faulty  object  before  the  identity  of  the  object  is  completely  known,  a  common  occurrence  in 
real  life.  The  problem  is  formalized,  a  general  well-known  solution  method  is  mentioned,  and  a 
limited  subcase  is  explored,  including  presentation  of  exact  (optimal)  and  approximate  solution 
methods.  We  illustrate  the  complexity  of  the  extended  problem  and  why  analogous  results  to 
the  simpler  binary  testing  problem  is  unlikely. 

1.  Introduction.  There  is  an  extensive  literature  in  the  binary  testing  problem,  a 
problem  featuring  the  analysis  of  optimal  and  near-optimal  test  procedures  with  respect  to 
expected  cost.  (See  [1]  for  a  survey  of  literature  on  this  general  problem.)  The  solutions  are 
presented  as  decision  trees.  While  working  in  this  area  we  were  struck  by  the  fact  that  for  com¬ 
puter  scientists,  physicians  and  anyone  else  interested  in  repairing  faults  as  well  as  finding  faults 
this  was  the  wrong  problem.  In  many  situations  one  wishes  not  to  isolate  the  fault  as  the  final 
solution  but  to  treat  the  fault,  and  treatment  may  often  occur  before  the  fault  is  isolated. 
Indeed,  the  treatment  can  also  be  in  part  a  test:  "Take  two  aspirin  and,  if  not  better,  see  me  in 
the  morning”.  Surprisingly,  no  theory  of  tests  and  treatments  parallel  to  the  theory  of  binary 
testing  seems  to  appear  in  the  literature.  We  outline  a  model  for  the  test-and-treatment  prob¬ 
lem  and  present  some  initial  results  for  this  model.  A  special  case  of  the  binary  testing  problem 
(the  “complete”  test  case)  has  a  simple  and  quickly  computed  solution,  the  Huffman  coding  pro¬ 
cedure.  We  had  hoped  that  there  might  be  an  interesting  generalization  of  the  Huffman  pro¬ 
cedure  for  the  analogous  case  in  the  test-and-treatment  problem  but  the  rich  interactions 
between  nodes,  which  themselves  are  clusters  of  treatments,  seems  to  preclude  a  simple  algo¬ 
rithm  to  solve  this  special  case.  We  then  discuss  an  approximation  algorithm  for  the  simplest 
case  and  illustrate  the  node  interaction  that  makes  finding  simple  optimal  algorithms  difficult. 

2.  The  Model.  We  first  present  the  model  for  the  binary  testing  problem  because  the 

model  we  consider  is  an  extension  of  this  binary  testing  model.  The  binary  testing  problem  is 
presented  by  n  objects,  n  a  priori  probabilities  of  fault,  and  m  tests  with  associated  costs.  Let 
U  ««  (oj,  . . . ,  o.  }  denote  the  n  objects,  let  denote  the  n  a  priori  probabilities 

that  report  the  user's  estimate  of  the  likelihood  of  the  corresponding  object  being  faulty,  and  let 
{Tj, . . . ,  Tm }  denote  the  m  binary  tests  with  associated  costs  {C u . . . ,  Cm  }. 

We  make  various  assumptions  to  simplify  the  problem  analytically.  We  assume  that  there 
is  only  one  faulty  object,  so  £p,  — 1.  The  assumption  that  tests  are  binary  means  that  they  are 
reliable  and  unambiguous;  in  particular  we  can  model  a  test  by  a  subset  of  the  universe.  The 
test  set  is  defined  as  follows:  an  object  is  placed  in  the  test  set  if  the  test  gives  a  positive 

*Thln  research  has  been  partially  supports  by  th«  Air  Foret  Office  of  Seltntiflc  Research  under  Grant  AFOSR-83-0205  and 
by  the  Army  Rtatareh  Office  under  Grant  DAAG2V-M-K-0O73. 
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response  when  that  object  is  the  faulty  object.  We  will  let  Tj  denote  the  test  set  as  well  as  the 
underlying  test,  because  functionally  they  are  equivalent.  Previous  analytical  work  has  usually 
assumed  that  the  tests  all  have  the  same  cost,  chosen  arbitrarily  as  a  unit  cost;  i.e.,  C,  =l,  all  i . 

Although  this  may  seem  draconian,  much  has  been  learned  using  this  restriction.  This 
knowledge  serves  as  a  message  regarding  the  more  general  case  with  arbitrary  costs.  (See  [2], 
[3]).  We  return  to  this  point  later. 

The  outcome  of  the  problem  is  a  decision  tree  that  instructs  one  as  to  how  to  apply  the 
tests,  where  the  choice  of  test  is  a  function  of  the  outcome  of  previous  tests.  For  any  particular 
problem  one  follows  a  single  path  of  the  tree,  branching  as  determined  by  test  outcome,  until 
the  faulty  object  is  isolated.  We  seek  the  tree  of  minimum  expected  cost,  where  expected  cost  is 
given  by 


EC 


E 


<— i 


P*th{ 


■  Pi 


(1) 


with  Path;  defined  as  the  sum  of  the  costs  of  the  tests  encountered.  In  the  case  of  uniform  cost 
for  tests  Path(  becomes  the  number  of  tests  encountered. 

The  model  for  the  (binary)  test-and- treatment  problem  that  we  adopt  here  extends  the 
binary  testing  model  by  the  addition  of  treatments  {Tm+x,...,Tmi.r }  with  associated  a  priori  pro¬ 
babilities  {pM+i,  .  .  . ,  pm  +f }  and  associated  costs  {Cm+lt .  .  .  ,  Cm+r).  Our  indexing  convention 
reserves  the  first  m  indices  for  tests  and  the  last  r  indices  for  treatments,  which  allow  a  uni¬ 
form  notation  for  both  tests  and  treatments.  Like  tests,  treatments  are  representable  by  sub¬ 
sets  of  U,  but  of  course  the  meaning  is  quite  different.  If  a  treatment  is  applied  then  the  unk¬ 
nown  object  is  considered  (completely)  treated  if  it  is  in  the  treatment  set,  and  not  treated  (or 
otherwise  altered)  if  it  is  not  in  the  treatment  set.  However,  in  each  case  the  cost  of  the  treat¬ 
ment  is  incurred.  In  the  decision  tree  that  represents  a  given  test-and-treatment  (TTr)  pro¬ 
cedure  there  would  be  only  one  arc  below  a  node  representing  a  treatment,  the  arc  that 
represents  the  continuing  path  for  non-treatment,  i.e.,  when  the  unknown  object  is  not  in  the 
treatment  set.  The  procedure  must  treat  the  unknown  object,  so  every  branch  of  the  decision 
tree  will  end  in  a  treatment. 

Our  objective  is  still  the  same,  to  find  procedures  that  minimizes  the  expected  cost. 
Expected  cost  still  is  defined  by  formula  (1)  used  for  the  binary  testing  problem,  but  the  notion 
of  path  now  changes  to  include  treatment  nodes. 

Figure  1  is  an  example  of  a  test-and-treatment  problem  presentation  with  two  TTr  pro¬ 
cedures  presented.  Although  we  believe  that  Procedure  2  is  optimal,  the  computation  to  estab¬ 
lish  that  is  sufficiently  time  consuming  that  optimality  has  not  been  proven.  The  decision  trees 
have  been  stylized  for  easier  reading.  Although  a  treatment  should  have  only  one  arc  below  it, 
we  have  added  a  second  arc,  with  double  lines,  to  record  at  the  end  of  that  arc  the  objects 
treated.  Technically  the  objects  treated  should  label  the  treatment  made  itself,  the  convention 
we  follow  when  the  treatment  is  at  the  end  of  a  path  and  all  objects  associated  with  that  path 
are  treated.  In  general,  a  test  or  treatment  labels  a  node,  and  an  object  with  its  associated  a 
priori  probability  (alternatively,  its  weight )  labels  the  end  of  a  path.  The  expected  cost  value 
for  each  procedure  is  also  given. 
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A  Sample  Test-and-Treatment  Problem 
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Figure  1 


3.  Finding  Low-Coat  Procedures.  Using  an  appropriate  dynamic  programming  or 
branch-and-bound  algorithm,  one  can  determine  a  minimum  cost  TTr  procedure  (decision  tree) 
even  with  arbitrary  test  costs.  However,  such  algorithms  take  at  least  m  2*  steps  to  execute  in 
the  general  case,  where,  as  before,  n  is  the  number  of  objects  and  m  is  the  number  of  tests. 
This  exponential  growth  in  the  number  of  objects  means  that  in  practice  only  small  problems 
can  be  solved  exactly.  Thus  there  has  been  a  great  interest  in  fast  algorithms  that  find  low-cost 
(but  not  necessarily  optimal)  test  procedures.  When  algorithms  are  presented,  the  obvious  ques¬ 
tions  are:  (1)  how  fast  is  the  algorithm?  (2)  how  close  to  optimal  are  the  resulting  test  pro¬ 
cedures? 

To  proceed  we  need  some  definitions. 

A  complete  TTr  problem  has  a  set  of  tests  such  that  each  subset  S  of  U  is  both  a  test  set 
and  a  treatment  set.  (For  tests  it  actually  suffices  that  either  S  or  U-S  be  a  test,  by  symmetry 
for  tests.)  The  complete  testing  problem  is  a  restriction  of  the  complete  TTr  problem  to  test 
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sets 


A  complete  TTr  problem  (or  testing  problem)  is  an  important  subcase  because  it  is  assured 
that  whenever  a  test  (treatment)  is  desired,  it  exists  and  may  be  used.  The  importance  of  this 
subcase  is  documented  by  some  major  properties  for  the  binary  testing  procedure,  which  we 
now  state.  See  [2]  and  [3]  for  details  and  further  properties. 

The  following  hold  for  the  binary  testing  problem  with  unit  test  costs. 

I.  There  is  a  0(n  logn )  algorithm  to  find  the  optimal  test  procedure  for  the  (unit  cost)  com* 
plete  testing  problem. 

II.  The  incomplete  testing  problem  is  NP* hard.  (That  is,  all  evidence  is  that  such  problems 
canrot  all  be  solved  in  0(n*  )  steps  for  any  given  integer  k  .) 

ill.  For  the  incomplete  testing  problem  the  most  natural  fast  approximation  algorithm  has 
been  evaluated  regarding  its  speed  (easily  seen  as  0(mn ))  and  its  expected  cost  approximation  to 

optimal.  See  [2]  and  [3], 

With  this  background  in  mind,  we  undertook  the  study  of  the  test-and-treatment  problem, 
which  as  mentioned  earlier,  seems  to  be  the  more  correct  problem  statement  for  most  real-life 
situations.  Our  original  goals  were: 

A.  To  find  a  fast  optimal  algorithm  for  the  complete  TTr  problem  (for  a  restricted  cost  case); 

8.  For  the  incomplete  TTr  problem  to  find  a  good  approximation  algorithm. 

After  considerable  study  on  the  former  question  we  have  reason  to  doubt  the  interest, 
perhaps  even  the  feasibility,  of  our  first  goal.  Besides  giving  some  quite  restricted  results  we  will 
demonstrate  why  seeking  an  optimal  solution  may  not  be  worthwhile  except  in  the  restricted 
case  we  mention.  (We  have  to  date  done  limited  work  on  the  second  goal;  that  work  is  beyond 
the  scope  of  this  summary.) 

By  an  eqxiiprobable  TTr  problem  we  mean  any  TTr  problem  where  all  a  priori  probabilities 
have  equal  value,  i.e.,  p,  =l/n  ,  where  n  is  the  number  of  objects  in  U . 

We  now  consider  briefly  a  dynamic  programming  solution  to  the  general  TTr  problem. 
For  any  subset  5  of  U ,  we  define  £’C(5)  by 

EC(S)  —  min  min  (C,--  |  5  |  +  (2) 

EC(S  n  Ti  )+EC(S-Ti )), 

min  (Cf  ■  |  5  |  +EC[S-Ti ))}, 

m  <i  <m  +r 

where  |  S  |  denotes  the  sum  of  the  weights  of  the  objects  in  set  5 ,  C,-  denotes  the  cost  of  test 
or  treatment  * ,  EC  |  <f>  |  =0,  and  any  term  reducing  to  EC(S)  itself  on  the  right  is  undefined. 
Tn  general  the  cost  of  computing  EC{U),  the  desired  answer,  is  exponential  in  »  because  there 
are  2*  subsets  that  need  consideration.  Our  colleague  Robert  Wagner  observed  that  ii  the  par¬ 
tial  expected  cost  EC(S)  only  depends  on  the  cardinality  of  S  then  this  minimization  can  be 
solved  in  0(n2)  steps.  Basically,  this  is  because  one  needs  to  know  only  EC(#S)  rather  than 
EC  (5)  for  all  smaller  subsets  5.  (#£  denotes  the  cardinality  of  5).  This  special  case  can  be 
realized  for  the  equiprobable  complete  TTr  problem  with  all  test  costs  the  same  and  treatment 
'■'osts  proportional  to  the  weight  of  the  treatment  set,  for  example.  (See  [4]  for  more  details.  A 
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method  of  parallel  computation  of  the  general  dynamic  programming  formulation  (2)  for  the 
TTr  problem  in  presented  in  (5].) 

Before  proceeding  to  discuss  an  approximation  algorithm  for  a  special  case  we  should  note 
that  the  most  obvious  special  case  is  not  interesting.  The  complete  TTr  problem  that  has  all 
treatments  as  well  as  all  tests  with  unit  cost  is  clearly  easy  to  solve:  (imply  invoke  the  universal 
treatment  (that  treatment  with  treatment  set  U)  so  that  every  object  is  treated  by  that  one 
treatment.  That  gives  an  expected  cost  of  1.  Clearly,  any  other  procedure  is  more  costly.  For 
uniform  treatment  costs  other  than  unit  cost  (the  cost  of  each  test)  the  answer  remains  the 
same  because  it  costs  as  much  to  treat  a  subset  of  V  as  to  treat  all  of  V . 

Some  experimental  work  has  shown  us  that  we  often  do  quite  well  in  an  arbitrary  TTr 
problem  (including  the  incomplete  TTr  problem  case)  if  we  choose  the  treatment  with  the 
lowest  cost/power  ratio  and  invoke  that  treatment,  and  recurse  on  that  strategyy.  By  power  we 
simply  mean  the  weight  of  the  treatment  set.  This  simple  rule  fails  if  a  number  of  treatments 
have  nearly  the  same  cost/power  ratio.  The  example  in  Figure  1  has  all  the  treatments  in  the 
problem  except  the  last  with  a  cost/power  ratio  of  10;  and  this  makes  the  outcome  more 
difficult  to  ascertain.  Treatment  T%  has  cost/power  ratio  leas  than  4,  so  by  our  guideline  should 
be  favored  over  the  other  treatments.  Both  the  procedure  illustrated  have  Tt  near  the  top  of 
the  decision  tree,  and  the  reader  may  wish  to  verify  that  placing  other  treatments  before  T$ 
yields  worse  procedures.  The  example  procedure  does  illustrate  that  one  may  want  to  use  tests 
before  invoking  the  most  effective  treatment  for  best  expected  cost. 

Because  the  single  most  important  determiner  of  value  for  treatments  seems  to  be  the 
cost/power  ratio,  our  special  case  investigations  have  focused  first  on  the  subcase  where  all 
treatments  have  the  same  cost/power  ratio.  We  hereafter  denote  that  ratio  by  k.  Test  costs 
will  be  fixed  at  unit  cost.  Again,  we  demand  that  all  tests  and  treatments  be  present. 

We  have  determined  certain  properties  of  optimal  procedures  for  this  special  case. 

Ltmma  1.  No  non-singleton  treatment  appears  in  an  optimal  procedure  for  this  subcase. 

Lemma  t.  All  tests  occur  prior  to  treatments. 


We  omit  all  proofs  although  the  proof  of  Lemma  1  follows  very  quickly  from  the  nature  of 
expected  cost  computations  in  this  special  subcase.  Any  multiobject  treatment  can  be  replaced 
by  a  sequence  of  singleton  treatments  for  the  same  objects  with  lower  expected  cost,  which  we 
call  a  cascade.  For  example,  the  single  treatment  given  by  treatment  set  {a,,o3lo ,}  can  be 
replaced  by  the  cascade  of  Figure  2  with  the  .expected  cost  then  reduced. 


A  cascade 
Figure  2 
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Vv’e  now  give  &d  approximation  algorithm  for  a  yet  more  restricted  subcase,  namely,  the 
subcase  under  consideration  (complete  TTr  problem,  unit  cost  tests,  treatments  of  cost  kw 
where  w  is  the  weight  of  the  treatment  set)  plus  the  equiprobable  requirement.  We  noted  that 
for  the  special  case  the  dynamic  programming  method  allowed  computation  of  the  optimal  deci¬ 
sion  tree  in  0(n2)  steps.  The  approximation  algorithm  gives  a  decision  tree  in  time  essentially 
independent  of  n  and  k  (constant  time).  We  present  a  bound  on  the  relative  error,  a  result 
tin.*  is  non-trivial,  requiring  considerable  understanding  of  the  nature  of  low-cost  decision  trees 
for  this  problem.  We  will  close  the  paper  with  an  attempt  to  illustrate  why  a  fast  optimal  algo¬ 
rithm  (beyond  the  special  case  dynamic  programming  algorithm)  is  likely  to  be  complex  and  not 
likely  to  be  found  soon,  if  it  exists.  Also  the  problem  of  finding  approximation  algorithms  for 
less  restricted  classes  must  take  these  concerns  into  consideration. 

The  decision  tree  class  to  be  used  for  this  special  case  is  a  simple  class.  The  trees  have  the 
following  properties: 

f .)  the  superleaves  of  the  tree  are  all  at  the  same  level,  where  each  superleaf  is  a  cascade  (so  the 

position  of  the  superleaf  locates  the  root  of  the  cascade); 

l)  the  cascades  differ  by  at  most  one  in  the  number  of  objects  treated; 

c)  the  level  of  occurrance  of  the  superleaves  is  level  l  (the  rest  is  level  0)  where  /  is  the  least 
integer  z  such  that 


*  <  4-2* . 

Thus  all  tests,  and  only  tests,  occur  above  level  l  with  nearly  identically  formed  cascades 
beginning  at  level  / .  The  cost/power  ratio  k  determines  the  transition  level. 

We  shall  call  the  class  just  defined  the  class  of  level  l  procedures.  See  Figure  3  for  exam¬ 
ples  of  level  /  procedures. 

An  upper  bound  that  holds  for  all  k  is  given  by 

EC,--*}-  <  1/1  (3) 


where  ECt  is  the  expected  cost  of  the  above  mentioned  decision  tree  and  opt  is  the  minimal 
expected  cost  possible.  Experimentation  shows  that  for  small  n  the  approximation  is  actually 
much  better  than  the  upper  bound  suggests.  The  actual  relative  error  does  not  seem  to  improve 
with  n  for  a  fixed  k ,  and  also  varies  considerably  with  k  even  for  small  values  of  k .  The  upper 
bound  is  as  weak  as  it  is  partly  because  it  represents  all  values  of  k .  The  theorem  statement 
jjiow  helps  explain  the  varying  relative  error  for  small  k  . 

The  relative  error  result  follows  from  a  key  theorem  that  holds  for  this  special  case. 

Theorem.  For  every  n  >0  and  every  /  >0,  there  is  a  cost/power  ratio  value  k  such  that  the  n- 
ooject  level  /  procedure  is  an  optimal  procedure  for  that  value  of  k . 

One  should  note  that  for  each  n  and  /  there  is  only  one  n  -object  level  /  decision  tree 
modulo  the  left-right  orientation  of  branches  of  binary  trees.  The  theorem  states  that  this  tree 
is  optimal  for  some  k .  We  can  determine  some  of  these  k  values  but  the  expression  is  messy. 
At  intermediate  k  values  the  approximation  seems  quite  good  but  is  hard  to  characterize  analyt¬ 
ically  Thus  our  relatively  modest  upper  bound. 


736 


What  is  at  least  as  interesting  as  this  upper  bound  is  a  characteristic  of  optimal  trees  that 
we  can  hint  at  by  example.  Although  there  is  symmetry  to  the  problem  presentation  (equiprob- 
able  weights,  costs  uniform  for  tests  and  dependence  only  on  the  number  of  objects  in  treat¬ 
ments)  the  optimal  tree  is  not  necessarily  fully  symmetric,  due  to  what  we  call  "migration  of 
objects”  from  cascade  to  cascade  as  ft  changes.  Without  going  into  the  specific  analysis,  we 
demonstrate  the  effect  in  Figure  3  where  we  present  three  decision  trees  for  a  specific  TTr  prob¬ 
lem. 

For  Figure  3  we  choose  n  —84  and  k  —16,  which  by  our  formula  for  determining  the  level  / , 
/—min  t(k  <  4*2*),  puts  /— 2  by  virtue  of  the  equal  sign;  had  ft— 16.001  then  /— 3  would  be 
needed.  That  is  reflected  by  the  same  expected  cost  value  for  the  level  2  tree  and  the  level  3 
tree.  (The  circle  with  enclosed  number  represents  the  number  of  elements  in  a  cascade;  we 
chose  n  so  that  all  cascades  are  equally  populated  to  remove  the  "excess  objects”  effect.  The 
branching  above  the  superleaves  represents  tests  that  split  the  relevant  sets  of  objects  exactly  in 
half.) 


Level  /  procedures  are  not  always  optimal 


Example:  «  —64  ft  —16 

level  2  procedure:  16  obj/cascade 


)] 


-  +|»M+±(4U«)]-£(M4) 


level  3  procedure:  8  obj/cascade 
£tfs-£(*64+±(8*38)l-JL(264) 


level  2,3  tree: 

EC.,—- ^-(2*28+3*36 

04 

+  ~(4*45-r2*105)j 
—  -27(56+108+45+52.5] 


Figure  3 
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The  third  tree  of  Figure  3  has  cascades  with  roots  at  levels  2  and  3  so  is  not  a  level  /  pro¬ 
cedure.  Yet  its  expected  cost  is  lower  than  the  two  level  /  trees.  One  might  have  expected, 
even  hoped,  at  the  transition  value  of  k  (viewing  k  as  increasing)  where  a  level  2  tree  passes  to 
a  level  3  tree,  that  an  even  splitting  of  each  cascade  from  16  members  to  two  cascades  of  8 
members  each  would  yield  optimal  trees.  We  see  by  example  that  instead  the  8-member  cas¬ 
cades  absorb  an  extra  member  and  allow  a  14-member  cascade  which  is  not  "big  enough”  to 
split  at  that  k  value.  This  is  what  we  term  "migration”.  This  migration  can  be  characterized 
hut  the  resulting  algebra  makes  computation  very  messy  and  the  determination  of  an  algorithm 
for  finding  optimal  trees  very  unpleasant,  if  doable,  even  for  this  simple  case.  It  is  better  to  use 
the  dynamic  programming  formulation  in  this  special  case  if  optimal  trees  are  needed,  and  in 
general  we  surely  will  settle  for  approximate  solutions,  even  in  the  complete  TTr  problem  case. 
(Recall  that  the  incomplete  TTr  problem  is  NP-hard  anyway,  since  the  simpler  incomplete  test¬ 
ing  problem  is  NP-hard.) 

We  now  state  briefly  how  the  terms  in  the  expected  cost  are  computed  in  Figure  3.  The 
first  line  in  the  computation  of  ECt,  the  expected  cost  for  the  level  2  procedure  outlines  the 
computation  symbolically.  We  do  not  sum  the  terms  PatA,  p,  directly  but  aggregate  com¬ 
ponents.  First  we  factor  out  the  common  weight  pi  (i.e.  1/64)  so  we  need  only  detemine  Patk{ . 
The  tests,  of  unit  cost,  cost  /  units  for  each  of  the  64  objects.  The  cascade  is  composed  of  unit 
treatment  costs  of  k/64,  and  there  is  one  less  object  subject  to  each  sequential  treatment  so  for 

9 

«  objects  in  a  cascade  there  are  i  object- treatments.  (Compare  with  (kt  /64)  «  for  a  single 

i-l 

treatment  for  all  s  objects.) 

We  presently  have  a  model  for  reasonable  approximate  procedures  in  the  arbitrary  weight 
case  but  no  upper  bound  or  relative  error  yet.  Such  models  may  serve  as  well  for  the  incom¬ 
plete  case.  Finding  bounds  on  their  relative  error  is  another  matter  however. 

The  integrated  theory  of  test-and-treatment  procedure  design  is  clearly  of  interest  and  it  is 
our  hope  eventually  to  better  understand  how  to  find,  with  reasonable  effort,  good  low-cost  pro¬ 
cedures  for  accomplishing  this  task.  We  are  hopeful  that  at  least  for  the  complete  TTr  we  can 
do  well  for  the  cost  structure  outlined  here. 

Acknowledgment.  We  wish  to  thank  Paul  Lanzkron  for  his  assistance  in  developing  a  program 
that  allowed  us  to  try  many  examples  for  insight  into  this  problem,  and  for  contributions 
towards  the  proof  of  the  upper  bound  result.  Paul  Lanzkron  will  be  a  co-author  in  the  paper 
that  fully  presents  the  material  outlined  here. 
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ON  THE  ERRORS  THAT  LEARNING  MACHINES  WILL  MAKE 


A.W.  Biermann,  K.C.  Gilbert,  A.  Fahmy,  B.  Koster 
Department  of  Computer  Science 
Duke  University 
Durham,  N.  C.  27706 


A  learning  model  has  been  studied  where  binary  function  f  of  p  binary  variables  z,,x2,...,:r()  is  to  be 
learned  from  example  input-output  behaviors.  At  each  time  t  =  1,2,3,...,  the  learning  machine  receives  a 
sample  input-output  pair  for  the  target  function  and  guesses  what  that  target  function  is. 

Most  learning  machines  are  capable  of  guessing  (or  learning)  only  L  functions  where  L  is  less  than 
the  set  of  all  possible  functions  2~r .  There  are  advantages  to  having  L  large:  more  functions  can  he 
learned,  and  in  general,  when  the  target  function  is  not  precisely  lenrnable  there  will  be  a  learnable  func¬ 
tion  not  too  distant  from  that  target  function.  Thus  the  learning  machine  will  be  able  to  choose  a  func¬ 
tion  which  agrees  with  «die  target  function  on  most  inputs  even  though  it  will  be  in  error  on  some  inputs. 
There  are  also  advantages  to  having  L  small  if  the  problems  with  error  are  not  too  severe:  learning  can 
occur  much  more  quickly  if  there  are  fewer  learnable  functions  to  choose  from.  This  paper  is  concerned 
with  a  number  of  different  learning  machines,  their  associated  values  for  L  and  the  nature  of  the  trade-off 
between  having  large  L  and  little  expected  error  versus  having  small  L  and  short  expected  learning  time. 

For  example,  the  signature  table  learning  model  of  Arthur  Samuel  was  studied.  A  characterization 
of  the  class  of  learnable  functions  was  found  which  gives  insight  into  how  the  mechanism  works  and  what 
types  of  functions  it  can  acquire.  The  characterization  specifies  the  form  that  certain  matrices  of  function 
values  must  have  in  order  for  the  function  to  be  realized.  This  leads  to  a  methodology  for  computing  L 
and  estimating  the  expected  error  that  signature  table  systems  will  have  in  attempting  to  learn  a  ran¬ 
domly  selected  target  function  from  the  set  of  all  possible  functions. 

Other  learning  models  have  similarly  been  studied  such  as  the  linear  evaluation  systems,  the 
Boolean  conjunctive  normal  form  learning  methodology  of  Valiant,  and  “truncation  machines”  which  sim¬ 
ply  memorize  the  outputs  with  the  assumption  that  they  are  determined  by  a  specified  subset  of  the 
inputs. 

The  L  learnable  functions  for  a  given  machine  may  be  widely  spread  across  the  space  of  all  possible 
functions  so  that  every  possible  function  is  near,  using  Hamming  distance  as  a  measure,  some  learnable 
function.  They  may  also  be  very  poorly  scattered  so  that  some  possible  larger  functions  are  very  far  from 
any  learnable  function.  In  order  to  gather  information  regarding  the  quality  of  these  learnable  function 
distributions,  a  new  learning  machine  was  invented,  the  “G-machine”,  which  spreads  its  learnable 
behaviors  in  a  near  optimal  fashion.  The  G-machine  thus  can  learn  with  very  low  expected  error  for  a 
given  value  of  L  and  serves  as  a  standard  for  comparison  with  other  learning  machines.  The  general 
result  in  some  simulations  was  that  most  learning  machines  achieved  expected  errors  which  were  surpris¬ 
ingly  close  to  the  best  known  values. 


This  paper  is  based  on  work  supported  by  the  U.S.  Army  Research  Office  under 
Grant  DAAG-29-84-K-0072  and  the  Air  Force  Office  of  Scientific  Research  Grant 
No.  81-0221. 
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A  MODEL  OF  DECISION  MAKING 
WITH  SEQUENTIAL  INFORMATION- ACQUISITION 
WITH  APPLICATION  TO  THE  FILE  SEARCH  PROBLEM 
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West  Lafayette,  IN  47907 


ABSTRACT.  We  present  the  file  search  problem  in  a  Decision  Theoretic  Frame¬ 
work,  and  discuss  a  variation  of  it  which  we  call  the  common  index  problem.  The  goal 
of  the  common  index  problem  is  to  return  the  best  available  record  in  the  file  where 
"best"  is  in  terms  of  a  class  of  user  preferences.  We  use  dynamic  programming  to  con¬ 
struct  an  optimal  algorithm  using  two  different  optimality  criteria,  and  we  develop 
sufficient  conditions  for  obtaining  complete  information. 


1.  Introduction 

Many  areas  of  computer  science  are  benefiting  from  the  application  of  economic 
decision  theory.  The  literature  in  data  base  design  [Mendelson  and  Saharia,  1986]  and 
expert  systems  [Hall  et  al.,  198S]  is  starting  to  draw  on  theories  of  rational  decision 
making.  Mendelson  and  Saharia  use  a  decision  theoretic  framework  for  defining  an 
optimal  data  base  design.  They  begin  by  defining  a  minimum  cost  answer  to  an  infor¬ 
mation  request,  and  then  they  balance  the  trade-offs  between  incomplete  information 
costs  and  data-related  costs  over  the  set  of  possible  queries. 

Hall,  Moore  and  Whinston  [1985]  develop  a  decision  model  that  characterizes  the 
ideal  (economically  rational)  behavior  of  an  expert  and  present  the  model  as  a  theoreti¬ 
cal  basis  for  expert  systems.  The  decision  model  combines  sequential  information 
acquisition  with  classical  decision  theory.  Decision  making  is  viewed  as  a  dynamic 
process  where  knowledge  is  obtained  by  a  sequence  of  actions  followed  by  a  final  deci¬ 
sion.  The  information  gathering  actions  taken  are  determined  by  the  trade-off  between 
the  cost  of  collecting  more  information  and  the  payoff  from  a,  better  decision.  The  con¬ 
struction  of  an  expert  system  is  then  shown  to  correspond  to  determining  a  feasible 
information  gathering  and  decision  strategy. 

Moore  and  Whinston  [1986]  expand  the  decision  model  developed  by  Hall  et  al., 
and  show  that  the  file  search  problem  can  be  interpreted  as  a  special  case  of  a  class  of 
decision  problems  called  the  categorization  problem.  They  show  that  there  are  essen¬ 
tially  two  mathematically  equivalent  ways  of  looking  at  file  search  -  from  a  decision 
theoretic  perspective  and  from  a  computer  science  viewpoint.  The  decision  theoretic 
approach  treats  file  search  as  a  decision  problem  with  a  payoff  based  on  the  value  of  a 
correct  answer  and  the  cost  of  retrieval,  and  then  develops  an  algorithm  that  maximizes 
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the  expected  return.  The  computer  science  approach  views  the  file  search  problem  as 
finding  the  algorithm  that  minimizes  the  expected  cost  (search  time).  Under  certain  cri¬ 
teria,  the  two  approaches  are  shown  to  be  equivalent,  and  the  decision  theoretic 
approach  is  used  to  construct  an  optimal  algorithm  that  is  equivalent  to  the  optimal 
binary  search  tree.  In  this  paper,  we  continue  to  use  the  decision  model  developed  by 
Moore  and  Whinston,  and  the  associated  information  economics  terminology.  Since 
this  terminology  is  not  universally  understood,  we  define  three  of  the  key  terms.  The 
idea  of  an  information  and  decision  strategy  corresponds  to  an  algorithm  in  computer 
science;  an  individual  utility  function  represents  a  person’s  preferences  over  a  given  set; 
and  a  social  welfare  function  is  an  aggregate  of  the  individual  utility  functions  and 
represents  the  well-being  of  the  users  as  a  whole.  At  the  end  of  this  paper  there  is  a 
Glossary  of  our  notation. 

We  generalize  the  file  search  using  the  decision  theoretic  approach  [Moore  and 
Whinston,  1986]  of  treating  the  selection  of  an  optimal  algorithm  as  an  optimization 
problem.  The  decision  theoretic  approach  provides  several  advantages  over  the  typical 
approach  used  in  computer  science.  Instead  of  constructing  an  algorithm  and  then 
evaluating  its  complexity,  we  are  able  to  develop  an  algorithm  that  we  know  to  be 
optimal  with  respect  to  predefined  criteria.  In  addition,  these  optimality  criteria  are  not 
limited  to  those  usually  used  in  computer  science  (worst  case  time  and  expected  time), 
but  include  functions  of  both  the  value  of  the  information  and  the  cost  of  the  retrieval. 
Thirdly,  if  there  is  a  binary  relation,  on  the  set  of  experiments  that  defines  a  total  ord¬ 
ering  over  the  experiments  and  induces  a  binary  or  trinary  information  structure  on  the 
state  space  then  we  can  use  dynamic  programming  to  calculate  optimal  algorithms. 
These  conditions  are  general  enough  that  we  expect  dynamic  programming  can  be  used 
to  create  optimal  algorithms  for  a  broad  range  of  decision  problems.  Finally,  by  using 
the  decision  theoretic  framework  we  are  not  restricted  to  obtaining  complete  informa¬ 
tion,  and  therefore  finding  the  requested  element  in  the  file.  In  general,  if  the  file  is  very 
large,  it  will  not  be  optimal  to  find  the  best  record  in  the  file.  A  person  concerned  with 
finding  "the  best  job"  in  the  U.S.  for  him/her  would  generally  not  find  it  worthwhile  to 
search  the  whole  file,  even  if  it  were  available.  At  some  point  the  expected  cost  of 
further  search  would  outweigh  the  expected  benefits  of  further  search  even  if  the  only 
costs  involved  were  the  time  required  to  continue  the  search. 

We  use  the  decision  theoretic  framework  to  study  a  generalization  of  the  file 
search  problem  that  we  term  the  common  index  problem.  To  date,  the  primary  role  for 
file  search  and  query  processing  has  been  to  aid  a  decision  maker  by  supplying  a 
specific  record  or  request.  The  user  specifies  some  specific  record  and  if  the  record 
exists,  it  is  returned.  In  many  decision  problems,  however,  the  need  is  not  for  a  specific 
record,  but  for  the  best  record  available.  We  propose  that  for  these  goal  oriented 
queries,  the  choice  procesr  should  be  embedded  within  the  query  system  [in  a  sense, 
making  the  query  processor  a  generalized  expert  system].  The  common  index  problem 
is  defined  as;  Given  a  random  individual,  find  the  element  in  a  set  that  maximizes  the 
individual’s  utility,  where  the  utility  function  is  a  function  of  the  index.  We  give  a  for¬ 
mal  definition  and  discussion  of  the  common  index  problem  in  Section  3. 

Motro  [1986]  has  taken  a  step  in  the  direction  of  goal-oriented  queries  by  propos¬ 
ing  a  distance  measure  for  data  base  queries  to  determine  the  best  available  records.  If 
the  ideal  record  for  a  query  is  not  in  the  data  base,  then  the  "closest"  available  record  is 
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returned.  In  Motro’s  formulation,  however,  a  record  will  be  returned  only  if  it  is  within 
some  minimal  distance  of  the  ideal  record,  so  null  responses  to  a  query  are  possible. 
The  approach  that  Motro  takes  is  ad  hoc;  presenting  no  theoretical  basis  for  the  use  of  a 
distance  function,  nor  presenting  a  method  for  generating  an  optimal  search  algorithm. 

In  this  paper,  we  confine  ourselves  to  finding  the  best  available  record  in  a  file. 

We  restrict  ourselves  to  searching  a  file  in  part  to  investigate  the  effect  of  the  optimality 
criteria  on  the  information  and  decisk  v  strategy.  We  are  assuming  that  the  search  pro¬ 
cess  will  be  repeated  many  times,  and  that  our  goal  is  to  provide  an  algorithm  for 
searching  the  file  that  will  maximize  a  social  welfare  function.  Although  computer  sci¬ 
ence  has  generally  limited  itself  to  analyzing  an  algorithm’s  worst  case  time  and  occa¬ 
sionally  an  algorithm’s  expected  time,  we  shall  analyze  the  algorithms  using  two  dif¬ 
ferent  payoff  functions. 

In  Section  2  of  the  paper  we  present  a  formal  characterization  of  the  general  deci¬ 
sion  model.  In  Section  3  we  define  the  common  indr  •  roblem,  develop  a  dynamic 
programming  solution  for  our  characterization  of  it  and  work  out  two  examples,  one 
with  each  optimality  criteria. 


2.  Decision  Model 

Our  decision  problem  is  defined  by  eight  elements: 

D  =  <X,  $,  D,  (0*,  A,  {Mt|a  e  A},  c,  r>, 

where: 

X  =  the  set  of  possible  (mutually  exclusive)  states.  We  use  the  generic  nota¬ 
tion  "x"  to  denote  elements  of  X. 

<j>:  X — >[0,1]  the  probability  density  function.  <>  defines  the  probability  distri¬ 
bution  function  n:  P(X)-»[0,1]  by: 

n(Y)  =  £  <*x)  for  YcX, 

xeY 

where  "P(X)"  denotes  the  power  set  of  X. 

D  =  the  set  of  available  (final)  decisions, 
co:  XxD-)(R  is  the  gross  payoff  function. 

A  is  the  set  of  "initial"  (information-gathering)  actions,  or  experiments,  avail¬ 
able. 

Ma  is  the  information  structure  associated  with  action  a  €  A.  (Each  Ma  is  a 
partition  of  X,  as  will  be  explained  in  more  detail  below.) 

c:  A-)iR+  is  the  cost  function;  c(a)  is  the  cost  of  utilizing  action  a  €  A. 

r  is  a  positive  integer  representing  the  number  of  information-gathering 
actions  which  can  be  taken  before  a  final  decision  is  made. 

Assumptions:  X,  D,  and  A  are  all  finite,  and: 

(V  x  e  X):  $(x)  >  0. 

In  particular,  we  shall  assume  that  A  has  n+1  elements,  where  n  2  1,  and  write 

A  =  {0,l,...,n}. 
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The  decision-maker  is  assumed  to  have  a  finite  set  of  feasible  (final)  decisions,  D, 
and  to  receive  a  (net)  payoff  which  depends  upon  the  state  of  the  environment,  x  €  X, 
the  decision  chosen,  d  e  D,  and  the  cost  of  information-gathering,  c.  One  may  suppose 
(see,  e.g.,  Marschak  and  Radnor  [1972]),  that  there  is  a  deterministic  relationship 
between  decisions,  states  of  the  environment,  costs  and  a  set  of  outcomes  (or  effects), 

E,  such  that  there  exists  an  outcome  function,  p(x,d)  mapping  the  set  XxD  into  the  set 
of  outcomes.  If  the  decision-maker’s  preferences  over  the  outcomes  may  be 
represented  by  a  real  valued  utility  function,  u(e,d),  for  all  e  e  E  and  d  €  D,  then  the 
(gross)  payoff  function  may  be  defined  by 

co(x,d)  =  u[p(x,d),  d].  (1) 

For  the  remainder  of  our  discussion  we  will  take  the  payoff  function,  to(  ),  as  given,  and 
will  identify  the  net  payoff  of  a  strategy  with  the  difference  between  the  gross  payoff 
obtained  and  the  cost  of  the  information-gathering  actions  undertaken.  1 

The  remaining  elements  of  our  decision  problem  revolve  around  the  construction 
of  an  information  structure,  and  the  costs  of  obtaining  information.  The  result  of  an 
information  acquisition  strategy,  a,  is  a  partition, 

B  =  {Bj,...,Bq} 

q 

on  X  such  that  Bjp>Bj  =  0  for  i*j,  and  Bj  =  X.  With  each  set  Be  B,  there  will  be 

i=l 

associated  a  cost  of  information-gathering,  C(B).  Thus  if  the  decision-maker  follows 
the  decision  function  8:B-»D,  the  expected  net  payoff  for  the  joint  strategy  (a,B,8)  will 
be  given  by: 

Q*(a,B,8)  =  £  £  <j>(x)co[x,8(B)]  -  £  Jt(B)C(B)  (2) 

BeBxcB  BeB 

In  a  summary  statement,  we  can  roughly  describe  the  goal  of  the  decision  problem 
being  analyzed  as: 

Choose  an  information  strategy  a  and  a  decision  function  8:  B  -»D  in  such  a  way 
as  to  maximize  (2)  over  all  a!  and  8':  B  — >  D. 

Associated  with  each  a  €  A  is  a  set  of  information  signals,  Ya,  and  a  function 
T|a:  X-»Ya.  We  shall  assume  that  each  Y,  contains  a  finite  number,  n(a),  of  different 
signals,  so  that,  without  loss  of  generality,  we  can  write: 

Y,  =  {l,2,...,n(a)>. 

We  shall  also  assume  that: 

i.  for  each  a  e  A,  rja  is  onto  Ya,  and 

ii.  n(0)  =  1  (so  that  the  a  =  0  action  is  the  null  information  action). 

f 

For  a  given  element  of  the  set  of  states,  x  e  X,  there  is  a  single  signal  receivable 
from  each  of  the  n  information  signal  sets.  We  shall  only  consider  the  case  where  infor¬ 
mation  is  obtained  deterministically  ("noiseless  information");  however,  it  can  be 
shown  that  noisy  information  can  be  incorporated  within  the  present  model  by  includ¬ 
ing  the  signals  as  a  part  of  the  specification  of  the  state  space.  (See  Marschak  and 

1.  For  further  discussion  of  these  points,  see  e.g.  Marschak  and  Radner  [1972],  pp.  41-44,  or 
DeGroot  [1970],  pp.  86-1 15, 
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Radner  [1972].) 

We  define 

M,y  =  {x  e  X|T|a(x)=y}  =  rii'1({y})  for  a  =  0,1,. ...n;  y  =  l,...,n(a); 
and 

M,  =  for  a  =  0,1,. ..,n. 

2.1.  Definition.  Let  BcX  be  non-empty.  We  shall  say  that  a  family  of  subsets  of 
X,  B,  is  an  information  structure  on  B  iff: 

i.  B  is  a  partition  of  B  (that  is,  the  sets  in  B  are  pairwise  disjoint,  and  their  union 
equals  B). 

ii.  (VB'e  B):  BV  0. 

Notice,  that  for  a  e  A,  Mt  is  an  information  structure  on  X  (by  Definition  1).  We 
shall  refer  to  Ma  as  the  information  structure  associated  with  ( or  induced  by  ) 

2.2.  Definition.  Let  BcX  be  non-empty,  and  let  a  e  A.  We  define  the 
information  structure  induced  on  B  b^  a*  l(B,a),  as: 

i(B,a)  =  {BpjM.LBpjM.2 . B^Ma  n(a)}  \  {0}. 

Notice  that  if  BcX  is  non-empty,  and  a  e  A,  then  i(B,a)  is  an  information  struc¬ 
ture  on  B. 

Assumption:  The  decision-maker  can  take  up  to  r  information-gathering  actions, 
where  1  £  r  £  n.  Since  we  include  the  null  information  action  in  A  (and  its  associated 
cost  will  be  assumed  to  be  zero),  we  can,  without  loss  of  generality,  assume  that  the 
decision-maker  takes  exactly  r  information-gathering  actions.  We  also  assume  that 
there  are  no  duplicate  information  structures,  i.e., 

(V  a,  a'  €  A):  M,  =  Mg»  =>  a  =  a'. 

2.3.  Definition.  A  feasible  strategy  for  1^  o,  is  a  sequence  of  r+1  pairs: 

o  =  <(B1,a1),(B2,a2) . (BpOr),(Br+I,8)> 

satisfying: 

1.  Bj  =  {X} 

2.  a.  Ot.-Bf+A  for  t  =  l,2,...,r. 

b.  Bt+1  =  R(Bt,at)  for  t  =  l,2,...,r. 

3.  8:  Br+j— >D. 

We  shall  denote  the  set  of  all  feasible  strategies  for  D  by  "£(D)". 

We  shall  often  find  it  convenient  to  regard  a  feasible  strategy, 
o  =  <(B1,a1),...,(Br,aT),(Br+1,8)>  as  being  composed  of  two  parts: 

i.  the  information-gathering  strategy: 


745 


a  =  <(B1,a1),...,(Bpar)>, 
ii.  the  decision  strategy,  (Br+i,8). 

Accordingly,  we  define  the  following: 


2.4. 
of  r  pairs: 


Definition.  A  feasible  information-gathering  strategy  for  D,  a  is  a  sequence 
a  =  <0*1,01!): . (Br,ar)> 


satisfying  1  and  2  of  Definition  2.3;  and  a  feasible  decision  strategy  for  D  is  a  pair 
(B,8),  where 

1.  there  exists  a  feasible  information-gathering  strategy  for  D, 

a  =  <(B1,a1),...,(Br,ar)> 

such  that 

RfBpOtr)  ^  B, 

2.  5:  B-»D. 


2.1.  Costs  and  PayofTs  of  Strategies 

The  following  two  definitions  (and  the  preceding  results)  will  enable  us  to  provide 
a  convenient  characterization  of  the  expected  cost  of  a  feasible  strategy. 

2.2.1,  Definition.  Let  a  =  <(Bi,ai),...,(Bpar)>  be  a  feasible  information-gathering 

strategy  for  D,  and  let  Br+1  =  R(Br,o^.).  For  each  q  e  {1 . r+1},  and  each  B  €  Bq,  we 

define  the  sequence  <(3l(B)>^i  by: 

Pt(B)  =  that  B'  e  Bt  such  that  Bf^B'  *  0.  (18) 

(It  is  shown  in  Moore  and  Whinston  [1986]  that  Pt(B)>^!  is  well-defined.)  We  shall 
refer  to  pt(B)  as  the  predecessor  of  B  at  t. 

2.1.2.  Definition.  Let  a  =  <(B1,cti),...,(Bpar)>  be  a  feasible  information-gathering 
strategy  for  D,  and  let  Br+1  =  R(Bpar).  For  each  q  e  (l,...,rfl },  and  each  B  e  Bq,  we 
define  a(B)  as  the  sequence  (of  length  q-1)  of  actions  taken  by  the  strategy  a  along  the 
path  that  yields  B;  that  is, 

a(B)  =  <a(  1  ,B),...,a(q- 1  ,B)>, 


where  we  define 


2 


a(t,B)  =  a^B)]  for  t  «  1,  ...,q-l . 


2.  That  is,  a(t,B)  is  the  action  taken  at  step  t  (t  -  1,  ...,q-l)  along  the  path  that  yields  B. 
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Assumption:  We  suppose  that,  with  each  a  e  A  is  associated  a  nonnegative  cost, 
c(a),  the  cost  of  employing  action  a.  Further,  we  assume  that  c(0)  =  0. 

In  a  given  realization  of  the  type  of  decision  problem  under  study,  the  application 
of  a  feasible  information-gathering  strategy, 

a  =  <(B1,ai), . (Bpar)>, 

will  result  in  the  determination  that  x,  the  true  state,  is  an  element  of  some 
B  e  Br+1  =  R(BpOtr).  The  cost  of  determining  that  £  e  B  will  be  the  sum  of  the  costs 
of  all  the  actions  taken  along  the  path  yielding  (ending  in)  B,  and  will  therefore  be 
given  by: 

C(B)  =  £  c[a(t,B)].  (1) 

t=i 

The  expected  cost  of  die  information-gathering  strategy,  a,  will  then  be  given  by 

*a)=  £  7t(B)C(B);  (2) 

B  6  B,+i 

and  of  a  feasible  strategy,  a  =  <a,Br+i,8>  will  be  given  by 

r(o)  =  y(a)=  £  ic(B)C(B).  (3) 

B  6  B„i 

As  noted  earlier,  the  function  0):XxD  ->  IR  yields  the  (gross)  payoff  associated 
with  a  given  state  x  e  X,  given  a  final  decision  d  e  D.  Thus  if  a  =  <a,Br+j,8>  is  a 
feasible  strategy  for  D,  its  expected  gross  payoff  is  given  by 

fl(o)=  £  £  4>(x)co[x,8(B)];  (4) 

B  c  B,+i  x  6  B 

and  the  expected  net  payoff  of  the  strategy  will  be  given  by 

n*(0)  =  S  1  Z  <Mx)0Xx,S[B])  -  lt(B)C(B)]  (5) 

B  e  B,,i  x  c  B 

=  Z  Z  4*xXi>[x,8(B)]  -  £  n(B)C(B) 

B  €  Br+j  x  €  B  Be  Br+1 

i 

=  Q(o)  -  r(o). 

We  can  now  more  formally  state  the  goal  of  our  decision  problem  as: 

choose  a *  e  1(D)  such  that  for  all  a  e  (D);  (6) 

£2(0*)  -  T(o*)  S  £2(o)  -  T(o). 
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The  following  definitions  and  results  will  be  useful  in  our  investigations  in 
Section  3. 

2.1.3.  Definition.  We  define  the  finest  information  structure  obtainable  from  A , 
B*  by: 

n  n-1  o-l 

ba  =  nHanHa . nM«inHyi(.)) 

i=l  1=1  1=1 

n-2  o 

nMainH-i^nMoi*  •••»  n^noo)  n  t0}- 

«=i  *=i 

It  can  be  shown  that  BA  £  Br+1,  for  any  feasible  final  information  structure,  Br+1; 
which  justifies  our  terminology  in  the  above  definition.  This  gives  us  a  simple  neces¬ 
sary  condition  which  must  be  satisfied  by  any  feasible  final  information  structure. 

Notice  that  in  a  given  realization  of  the  decision  model,  BA  represents  the  best 
information  that  can  be  obtained  even  if  r  £  n.  (Thus,  for  example,  the  finite  memory 
of  even  the  largest  computer  available  puts  an  upper  limit  on  the  number  of  decimal 
places  one  can  use  in  representing  a  real  number.)  From  the  standpoint  of  the  decision 
problem,  therefore,  any  data  concerning  individual  elements  of  members  of  BA  is,  in  a 
sense,  irrelevant  to  the  decision  at  hand. 

2.1.4.  Definitions.  If  B  c  X  is  non-empty,  we  define  the  potential  gross  payoff 
associated  with  B,  v(B),  and  the  conditionally  optimal  decision  set  for  B,  D*(B),  by 

v(B)  *=  Max  £  4>(x|BXo(x,<l),  (11) 

d€  d*£ b 


and 

D*(B)  =  {d  e  D  |  X  4>(x|B)co(x,d)  =  v(B)},  (12) 

X  6  B 


respectively. 

Notice  that  we  can  equally  well  define  D*(B),  the  conditionally  optimal  set  for  B, 

as: 

D*(B)  =  {d  e  D  |  £  «>(x)<o(x,d)  =  n(B)v(B)}  (13) 

X  E  B 

Given  this  consideration,  the  following  result  becomes  more  or  less  immediate. 

2.1.5.  Proposition.  If  o  =  <a,Br+i,8>  is  optimal  for  D,  then  for  each  B  6  we 
must  have  8(B)  e  D*(B).  Furthermore,  the  expected  gross  payoff  for  o,  Q(o),  will  be 
given  by: 


748 


Q(o)=  £  x(B)v(B). 

B  €  B,+i 


3.  Common  Index  Problem 

The  common  index  problem  was  briefly  described  in  Section  1  as  a  generalization 
of  the  file  search  problem.  We  shall  formalize  the  common  index  problem  using  the 
decision  theoretic  framework  just  presented,  but  we  first  present  the  formalization  of 
the  file  search  problem  in  both  the  decision  theoretic  framework  and  the  computer  sci¬ 
ence  framework. 

Aho,  Hopcroft  and  Ullman  present  the  file  search  problem  as: 

...  we  were  given  a  set  S  =  {aIta2,...,an},  that  is,  a  subset  of  some  large  univer¬ 
sal  set  U  [which  is  linearly  ordered  by  a  relation  £],  and  we  were  asked  to 
design  a  data  structure  that  would  allow  us  to  process  efficiently  a  sequence  o 
consisting  only  of  MEMBER  instructions.  Let  us  reconsider  this  problem, 
but  this  time  let  us  assume  that,  in  addition  to  being  given  the  set  S,  we  are 
given  the  probability  that  the  instruction  MEMBER(a,S)  will  appear  in  o  for 
all  elements  a  in  the  universal  set  U.  We  would  now  like  to  design  a  binary 
search  tree  for  S  such  that  a  sequence  a  of  MEMBER  instructions  can  be  pro¬ 
cessed  on-line  with  the  smallest  expected  number  of  comparisons.  [Aho, 
Hopcroft  and  Ullman,  1974,  p.  1 19]. 


3.1.  The  Computer  File  Search  Problem  •  Decision  Theoretic  Framework 

This  section  presents  the  decision  theoretic  approach  to  file  search  developed  in 
Moore  and  Whinston  [1987,  Sec.  4.2]. 

We  suppose  that  there  is  some  universal  set,  U,  which  is  finite  and  linearly 
ordered,  and  that  we  are  dealing  with  a  non-  mpty  subset, 

(1) 
(2) 


mnnAC*  that  tK«m 


S  =  {bi,l>2  bD}  C  U, 
bj  <  bj+j  for  i  =  l,...,n-l. 


and 


Pr(b<bi),  Pr(b  >  bn), 


Pr(bj  <  b  <  bi+J)  for  i  =  l,...,n-l. 


Pr(b  =  bj)  for  i  =  l,...,n, 


are  well-defined,  for  b  a  random  element  of  U. 

The  basic  idea  is  that  the  elements  of  S  correspond  to  a  stored  data  set  drawn  from 
U.  We  consider  the  problem  of  searching  the  set  in  order  to  determine  whether  a 


749 


randomly  drawn  element  from  U,  b,  is  in  the  set  S  or  not;  and  if  it  is  in  S,  to  determine 
its  location  (i.e.,  for  which  i  we  have  b=4>j).  The  available  experiments  can  be  denoted 
by: 


A  =  {0,1,. ..,n>, 

where  for  a  =  l,...,n,  the  experiment  a  is  interpreted  as: 

"compare  b  with  b," . 

(and  a=0  represents  the  null  information  experiment).  Thus,  for  a  e  At,  the  possible 
outcomes  of  the  experiment  are: 


b<bt,  b=bt,  or  b>bt. 


(3) 


For  notational  convenience,  we  shall  represent  the  state  space  as: 

X  =  YUZ, 

Y  =  {yi,.  ,yn}. 

x  =  yj  b  =  bj  for  i  =  l,...,n; 


where 

with  the  interpretation: 
and 

with  the  interpretation: 


X  =  Zj  «•><{ 


b<*>i  if  j  =  0 

bj  <  b  <  bj+j  for  j  =  l,...,n-l 


b  >  b„ 


for  j  =  n. 


We  then  define 


and 


Pi  =  Pr(x  =  yj)  =  Pr(b  =  bj)  for  i  =  l,...,n, 
q0  =  Pr(x  =  Zq)  =  Pr(b  <  bj), 
qj  =  Pr(x  =  zj)  =  Prfbj  <  b  <  bj+j)  for  j  =  l,...,n-l, 

q„  =  Pr(x  =  z„)  =  Pr(b  >  bn). 


From  (3)  and  our  specification  of  X,  we  see  that  for  each  a  €  A},  the  information 
structure  for  a,  Ma,  can  be  written  as: 


Mt  = 

where 
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m«i  -  {yiv..,y«-iKj{zo*-->z»-i}> 


and 


M*  =  {y.}, 

^i3 =  {ywi>“'>yn}^j{Z| . Zj}' 


To  complete  our  specification  of  the  problem,  we  note  that  we  can  specify  D  as 


D  =  {0,1,. ...n}, 

with  the  interpretation: 

d  =  0  corresponds  to  the  decision  b  d  S  (i.e.,  x  e  Z), 
d  =  j  corresponds  to  the  decision  b  =  bj  (i.e.,  x  =  y)  for  j  =  1  ,...,n. 

It  also  seems  appropriate  here  to  specify  our  gross  payoff  function  co:  XxD  -» IR  as 


CD(x,d)  =  1 


G)>0 

0 


if  d=0  and  x  6  Z,  or  if  d  e  {l,...,n}  and  x=yd,  and 
otherwise. 


The  user  is  thus  indifferent  to  whether  the  requested  record  is  in  a  gap  or  an  element  of 
the  file,  as  long  as  he/she  is  correctly  informed  of  the  status  of  the  requested  record. 

We  also  suppose  that  there  exists  some  constant  c  >  0  such  that: 

c(a)  =  c  for  a  =  l,...,n. 


3.2.  The  Common  Index  Problem  •  Decision  Theoretic  Framework 

In  the  common  index  problem,  a  random  individual  wants  to  determine  if  a  given 
element,  the  ideal  element,  from  a  universal  set  is  present  in  a  known  subset  (the  file). 

In  the  common  index  problem,  however,  if  the  ideal  element  is  not  in  the  file  then  the 
best  available  element  is  to  be  returned.  To  determine  the  best  available  element  in  the 
file  we  must  make  a  couple  of  assumptions.  The  first  assumption  is  that  the  index  asso¬ 
ciated  with  each  record  has  an  intrinsic  meaning.  As  an  example,  if  the  index  for  a  file 
containing  data  on  a  set  of  cars  is  the  price  of  the  car,  then  the  index  has  a  universally 
agreed  upon  meaning.  The  second  assumption  necessary  to  determine  a  best  available 
record  is  that  there  exists  an  individual  utility  function  associated  with  each  potential 
user  and  that  the  utility  function  is  a  function  of  the  index.  The  utility  functions  meas¬ 
ure  the  individuals’  preferences  over  the  elements  in  the  file,  and  can  be  interpreted  as 
measuring  the  ameunt  or  quality  of  information  in  a  record,  or  merely  the  desirability  of 
the  object  that  the  record  represents.  We  are  assuming  that  all  of  the  relevant  informa¬ 
tion  for  choosing  between  records  is  contained  in  the  index.  In  a  future  paper,  we  plan 
to  extend  the  ideas  developed  here  to  multiattribute  data  base  queries  where  th  utility 
functions  are  functions  of  the  attributes. 

As  an  example  of  the  common  index  problem,  consider  a  person  who  wants  to 
vote  in  an  upcoming  election.  The  only  information  available  about  the  candidates  is  a 
rating  from  0  to  20  where  0  implies  the  candidate  is  left-wing  and  20  implies  the  candi¬ 
date  is  right-wing.  The  election  is  for  the  U.S.  economic  advisor,  and  the  set  of 
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possible  candidates  is  the  set  of  all  people,  both  historical  and  modem.  As  of  election 
day,  the  set  of  candidates  include:  Karl  Marx,  Fredrick  Engels,  John  Maynard  Keynes, 
J.R.  Hicks  and  Adam  Smith.  An  omnipotent  panel  has  assigned  index  values  to  each  of 
the  candidates  as  shown  in  Table  1.  The  prospective  voter  must  find  the  one  candidate 
that  maximizes  his/her  utility  by  searching  the  index.  The  index  value  returned  will 
correspond  to  the  best  available  candidate  the  he/she  can  vote  for.  The  search  process 
is  similar  to  the  computer  file  search,  except  that  the  user  is  not  looking  for  a  particular 
record,  but  for  the  best  record  available. 


Table  1 


Candidate 

index  value 

Karl  Marx 

1 

Fredrick  Engels 

2 

John  Maynard  Keynes 

6 

J.R.  Hicks 

19.5 

Adam  Smith 

20 

To  formalize  the  common  index  problem,  we  assume  that  there  exists  some 
universal  set  T.  The  set  S  is  the  set  of  available  alternatives  from  which  a  best  element 
can  be  chosen.  S  =  {si,...,sm}  is  a  subset  of  T,  and  is  assumed  to  be  non-empty. 

Define  W  as  the  common  index  on  T  where  W  =  [0,w]  e  IR+.  The  function  y, 
y:T  ->  W,  assigns  an  index  value  to  each  element  in  the  set  T.  y  may  or  may  not  be 
known,  but  we  assume  y(s)  is  known  for  all  s  e  S.  We  define 

W  =  {w  g  W:(E3  s  e  S):w  =  y(s),}.  W  =  {&j,...,wn},  is  the  set  of  indices  of  the  avail¬ 
able  elements  s  €  S,  and  it  is  over  W  that  the  search  is  conducted. 

The  set  of  potential  users  is  designated  by  U,  and  each  user  is  identified  by  his/her 
utility  function  u  e  U.  The  set  U  does  not  necessarily  correspond  to  the  set  of  people 
who  may  search  the  file,  but  rather  to  the  set  of  possible  requests  they  may  make.  Thus, 
if  a  person  searches  the  file  multiple  times,  each  time  with  a  different  set  of  preferences 
over  the  index,  then  the  person  is  considered  to  be  multiple  users  with  different  utility 
functions. 

The  probability  distribution,  h(-),  is  a  distribution  over  U  and  is  assumed  to  be 
known.  h(u)  denotes  the  probability  that  user  u  e  U  will  query  the  system. 
h:U  — ►  [0,1] 
u:W  — » IR  ;  V  u  e  U. 

To  make  the  problem  more  tractable  we  will  restrict  the  set  of  utility  functions  to 
single-peaked,  symmetric  functions  that  are  strictly  decreasing  away  from  the  max¬ 
imum.  Specifically  we  require  that  for  each  ueU 

(3  vu  g  W  and  0R+  -»  R:u(w)  =  fi(  |  w  -  vu  | )) 

where  u  is  strictly  decreasing,  and  vu  is  the  ideal  element  in  W  for  the  user  represented 
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by  the  utility  function  u.  ' 

These  restrictions  allow  the  individual  utility  functions  to  be  completely  character¬ 
ized  by  vu.  All  individual  utility  functions  maximized  at  vu  will  have  the  same  best 
available  element;  thus  a  user  need  merely  specify  his/her  ideal  point  without  having  to 
express  the  complete  functional  form  of  his/her  utility  function. 

Due  to  the  restriction  on  the  individual  utility  functions,  the  best  available  element 
will  always  be  the  element  w  e  W  closest  to  vu.  The  problem  therefore  becomes: 
given  a  randomly  drawn  utility  function  u  e  U,  find  the  k  e  {l,...,n}  satisfying 

|  wk  -  vu|  £  I  Wj  -  vu| ;  j  =  1 . n.  We  now  want  to  develop  a  search  strategy  that  will 

maximize  the  social  welfare  of  the  users,  where  the  social  welfare  function  is  defined 
by: 

2  2  <Kx)to(x,S(B))  -  2  !t(B)C(B) 

B  €  Br+i  x  €  B  B  €  Br+j 

To  compute  the  optimal  information  structure  for  the  search  we  must  first  define 
the  decision  problem’s  state  space.  We  define 


where 


and 


^  {Xo>Xl»*-*Xn>Xn+l } 


o 

V 

3 

> 

j 

=  0 

vu  €  [Vj_ 

l»Vj) 

j 

=  1 . 

X  =  Xj 

Vu€  [V^j.w] 

j 

=  n 

j 

=  n+1 

Vj  >  w 
* 

• 

0 

j  = 

0 

vi=< 

(Wj 

+  Wj+1)/2 

j  = 

1,. 

..,n-l 

w 

j  = 

n 

The  elements  of  the  state  space  partition  W  into  half  open  intervals.  The  endpoints  of 
the  elements  x  e  X  bisect  the  intervals  between  the  elements  of  W.  For  example,  if 
W  =  [0,10]  and  W  =  {3,7},  then  X  =  {(-~,0),[0,5),[5,10],(10,~)}. 


We  now  define  the  probability  that  x  e  X  is  the  true  ‘jtate  (i.e.,  vu  €  Xi)-  First 
note  that  the  probability  measure  h(-)  induces  a  probability  measure  h*(  )  on  W,  where 
h*(w)  is  the  probability  that  vu  =  w.  The  only  restriction  needed  here  on  h(  )  and  h*(*) 
is  that  they  can  be  used  to  generate  a  probability  distribution  for  the  state  space,  X.  If 
h(’)  is  a  continuous  distribution  on  U,  the  probability  that  x  is  the  true  state  is  <J)(x) 
where: 


3.  This  is  essentially  the  assumption  basic  to  Coombs’  "unfolding  technique".  See  Coombs 
[19S0]  and,  for  a  discussion  of  empirical  tests  and  extensions,  Coombs,  Dawes,  and  Tversky 
[1970],  pp.  55-66. 
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<MXj)  = 


j  €  {0,n+l} 
j  e  {1 . n} 


and 


vj 

Pj  =  J  h*(w)dw 
vh 


The  set  of  available  experiments  is  defined  by  A  =  {0,1, ..  .n}  where  a  =  0  is  the 
null  experiment  and  a  =  1,  ...,n  can  be  interpreted  as: 


compare  vu  with  v*_!  and  v. 


With  possible  outcomes 

Vu  <  Va-1  or  v,.!  £  vu  <  v,  or  vu  £  v, . 

For  each  a  e  A!  =  A\0  the  information  structure  induced  by  a,  M„  is  trinary. 

Ma  =  where 

Mai  =  (Xo>— 

Ma2  =  (X.} 

Ma3  =  (Xa+lv-jXiH-l}* 
and 

Moi  =  Mq2  =  <f),  Mq3  =  X 

Mn+l,l  =  X-  Mn+1,2  "  Mih-1,3  =  $ 

To  complete  the  specification  we  must  define  D,  c(a),  and  co(x,d).  The  decision  set  is 
D  =  { l,...,n}  where  d  =  j  corresponds  to  the  decision  that  vu  e  Xj»  and  that  the  best 
available  element  is  wj. 

The  cost  function  is  c(a)  =  c  for  a  =  1 . n  where  c  is  a  strictly  positive  constant 

and  c(0)  =  0. 

We  shall  look  at  two  payoff  functions: 

]»  >  0  tfvu€Xd 

“<x4)=lo  otheiwise 
and 

co(x,d)  = -|wu  -  wd| 

where 

wd  e  Xd  and  wu  is  such  that  vu  €  xu  ( wd  is  the  element  w  e  W  chosen  by  the 
search  process  and  wu  is  the  element  e  W  that  maximizes  Q(|vu  -  wj) 
wj  e  W). 

The  first  payoff  function  identifies  the  social  welfare  function  with  the  utility  of 
the  designer  or  operator  of  the  system;  the  idea  basically  being  that  if  the  programmer 
(or  operator)  finds  the  best  element  in  the  file  for  a  given  user,  then  he/she  gets  a  pat  on 
the  head,  and  gets  trouble  otherwise.  The  second  payoff  function  identifies  the  social 
welfare  more  closely  with  the  utility  of  the  user,  and  is  used  as  an  approximation  to  the 
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function  CD(x,vu,d)  given  by: 

w(x,vu,d)  =  -(|wd  -  vu|  -  |wu  -  vu|)  =  |wu  -  vu|  -  |wd  -  vu|, 
where,  again, 

wu  is  that  wj  such  that  vu  e  Xj» 
and 

wd  is  the  element  chosen  by  the  search  process. 

Note  that  we  always  have 

co(x,d)  =  -|wu  -  wd|  £  |wu  -  vu|  -  |wd  -  vu|  m  w(x,vu,d). 

Moreover,  if 

Wd  £  w„  £  vu,  or  wd  £  wu  £  vu, 

then 

t»(x,d)  =  w(x,vu,d). 

If,  however, 

wu  £  vu  >  wd  or  wd  >  vu  £  wu, 

4 

the  two  payoff  functions  will  differ.  On  the  other  hand,  since 

o>(x,vu,d)  *  |wj  -  vj  -  |wd  -  vu| 

is  a  function  of  vu  as  well  as  x,  we  cannot  use  it  as  the  payoff  function  in  our  problem 
(that  is,  its  use  would  necessitate  a  different  specification  of  the  state  space). 4  5 

The  dynamic  programming  solution  follows  in  a  manner  almost  identical  to  the 
one  developed  in  Moore  and  Whinston  [1987,  Sec.  6.3].  6  We  present  it  here  for 

4.  Notice  that  we  cannot  have  £  vu  or  vu  £  *d  >  4r„;  giver  the  definition  of  &u. 

5.  If  it  were  the  case  that  for  each  ueU,v(  were  an  element  of  W,  then  this  difficulty  would 
disappear;  i.e.,  to(x,d)  would  always  equal  o(x,vu,d).  However,  this  is  a  condition  which  we 
would  not  expect  to  be  realized  in  practice. 

6.  The  key  condition  needed  to  develop  this  sort  of  dynamic  programming  solution  is  that  the 
relation  >*  defined  on  A  by 

a  >*  a'  q  Mal 

is  a  (strict)  linear  order;  that  is,  that  it  is  total,  asymmetric,  and  transitive.  Notice  that  this 
condition  is  satisfied  here. 
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completeness.  The  purpose  is  to  calculate  the  information  and  decision  strategy,  o*, 
such  that  Q*(c*)  £  Q*(c),  V  ae  1(D).  We  assume  r  £  n  and  we  define  for 
i  e  {0,1,..., n}  and  j  e  {i+l,...,n+l},  By  c  X  where 


By  =  (=  for  our  given  state  space) 

k=»+l 

1.  For  k  e  {l,2,...,n}  we  define  AfMj^)  by 

AfM^)  =  JtCM^vCM^), 
and  for  i  e  {0,1, ...,n},  we  define 

^(®i,i+l  =  0. 

For  i  €  {0,l,...,n-l}  and  j  e  {i+2,...,n+l}  we  define  A(By)  by 
A(Bjj)  =  JC(By)v(By)  =  =  AMw>2 

2.  For  each  i  e  {0,l,...,(n+l)-3}  we  calculate 

f(j)  =  A(Bij)  +  A(Mj2)  4-  A(Bj>i+3)  -  j  =  i+l,i+2 

and 

f(0)  =  tt(By+3)v(B^3) 

We  then  define  7 

A(Biii+3)  =  Max  {  f(0),f(i+ 1  )/(i+2)} 
and 

a(i,i+3)  =  Min{j  e  {0,i+U+2>  |  f(j)  =  A(Bu+3)} 

3.  For  each  i  €  {0,l,...,(n+l)-4}  we  calculate 

f(j)  =  A(Bjj)  +  A(Mj2)  +  A(Bj  4+4)  -  7t(B^w)c(j);  j  =  i+l,i+2,i+3 
f(0)  =  KfBy^MB^) 

A(Bi>M)  =  Max{f(0),f(i+l),f(i+2),f(i+3)} 
fi(i,i+4)  =  Min{j  e  {0,i+l,i+2,i+3}  |  ffc)  =  AfB^)} 

4.  Having  found  ACBy+^ij)  i  =  0,l,...,n+l— (I— 1)4  2  3 
7.  Notice  that  if  C(i+1)  -  C(i+2),  then  f(i+ 1)  -  f(i+2). 
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Wc  compute  for  i  e  {0,l,...,n+l-I} 


f(j)  =  A(Bij)  +  A(Mj2)  +  A(Bj>i+I)  -  Jt(Bu+,)c(j)  for  j  =  i+l,...,i+l-l 


f(0)  -  nCB^j+iMBj^i) 

A(Bi|i+1)  =  Max{f(j)  |  j  e  {0,i+l,...,i+l-l}},  and 
=  Min{j  e  {0,i+l . i+1-1}  |  f(j)  =  A(Bi>i+1)} 

5.  Proceeding  as  above,  we  eventually  obtain 

A(B0i0+1)  =  A(X)  and  &(0,n+l). 

Wc  then  define  the  strategy  c*  =  <(B*1,a*1),...,(B*pa*r),(B*r+|)5*)>  by 

a*!(X)  =  S(0,n+1)  e  a*.  (16) 

We  obtain  one  of  two  cases. 

a.  a*i(X)  =  0  and  8*2  =  Mt*  =  {X};  in  which  case,  we  complete  the  definition  of 
o*  by  defining 

a*,(X)  =  0  for  t  =  2 . r,  (17) 

and  let 

d  e  D*(X).  (18) 

b.  a*j  e  {l,...,n}  and 

-  Mt*  =  {Bo.t.M^B,*^!}. 

Here  we  define  a*2  by 


a*2(B0,,.)  =  4(0, a*),  a*2(M,*2)  =  0,  c^B,.^)  =  a(a*,n+l).  (19) 


Having  obtained  a*t_i  and  B*t,  for  t  e  {3,...,r},  we  have  that  each  B  e  B*t  is 
either  of  the  form 

B  =  By  for  some  ie  {0,l,...,n}j  e  (i+2 . n+1},  (20) 

or  is  of  the  form 

B^Mh  for  some  ke  (21) 

We  then  define  a*t  on  B*t  by: 

/i(ij)  if  B  is  of  the  form  (20)  with  j  >  i+2. 
a*t(B)  -  |q  otherwise. 


757 


Proceeding  in  this  fashion  we  eventually  obtain  B*r+j,  and  let  8*(B)  be  an  element 
of  D*(B),  for  each  B  6  B*r+i . 

□ 


It  will  be  an  easy  consequence  of  the  following  result  that  a*,  as  defined  in  (17)- 
(22),  above,  is  optimal  for  D. 

3.2.1.  Theorem.  If  o  =  <(B1,a1),...,(Bpar),(Br+1,8)>  is  a  feasible  strategy  for  D, 
q  e  {l,...,r+l},  and  i  e  {0,1, ...,n}  and  j  e  {i+2,...,n+l }  are  such  that  By  €  Bq,  then  8 

I  I  <t>(x)co[x,8(B)]  -  £  2  jKBOcfOtfBOl^ACBy),  (23) 

B  €  Br+l(Bn)  X  €  B  t=qB'«  B^B„) 

where  for  q  =  r+1,  we  define 

£  £  n(B')c[a,(B')l  =  0. 

t=q  B'  e  B,(By) 

Proof.  We  distinguish  two  cases,  based  on  the  value  of  q. 

a.  q  =  r+l.  Here  the  left-hand-side  of  inequality  (23)  becomes: 

2  <Kx)C0[x,8(B)];  (24) 

X  €  Bv 

and,  since  (24)  is  less  than  or  equal  to 

7t(Bij)v(Bij)  £  A(Bjj), 

the  desired  inequality  follows  at  once. 

b.  q  e  {l,...,r}.  Here  we  establish  our  result  for  arbitrary  i  e  {0,l,...,n-l}  by 
induction  on  I  =  j-i,  as  follows. 

i.  I  =  2  (and  j  =  i+2).  Here  we  have  By  €  BA  and  thus 

Br+l(Bjj)  =  {By}, 


so  that  the  left-hand-side  of  (23)  becomes: 

2  <Kx)o>[x,8(B)]  -  2  JtfByMOttBy)] 

x  €  Bn  t=q 

l 

S  2  4Kx)W[x,8(By)]Sjl(By)V(By)  =  A(Bij). 

x  e  Bn 


8.  Notice  that  if  B  e  B,,  then  there  must  exist  i  e  {0,1,.. .41}  and  j  e  {h-2,..mih-1}  such  that 
B  =  B,j.  This  is  intuitively  fairly  apparent  and  can  be  proved  rigorously  by  an  argument 
essentially  identical  to  that  in  Moore  and  Whins  ton  [1967,  Corollary  6.2.6.]. 
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ii.  Suppose  the  desired  inequality  holds  for  j  =  i+1,  where  I  e  {2 . n-i}.  Then 

for  j  =  i+I+1,  we  have  three  possible  cases. 

Case  1.  ot^CBjj)  ike  {i+l,...j-l}.  Here  it  follows  from  Theorem  6.2.5  of  Moore 
and  Whinston  [1987]  that  we  can  write  the  left-hand-side  of  (23)  as: 

I  £  4>(x)ci>[x,6(B)j  -  £  I  n(B')c[aI(B')]  (25) 

B  c  IWBft)  X  c  B  H*l  B'i 

+  I  I  <Kx)<o[x,8(B)]  -  £  £  n(B')cta,(B')] 

B  €  x  €  B  N|+l  B'  €  Bt(Bkj) 

+  I  S  <Kx)co[x,S(B)]  -  £  £  x(B')c[a,(B')] 

B  c  By,l(Mk2)  x  €  B  t(j+l  B'  €  Bt(Mu) 


-  Jt(Bij)c(k). 


However,  since  k  e  {i+1,...  j— 1},  and  j  =  1+i,  it  follows  that  k-i  £  1  and  j-k  $  1.  Con¬ 
sequently,  it  follows  from  our  inductive  hypothesis  that  (25)  is  less  than  or  equal  to 


A(Bik)  +  A(Bkj)  +  A(Mk2)  -  Jt(Bjj)c(k)  £  A(Bjj)  (26) 


Case  2.  a,,(By)  s  k  4  {i+l,...,j-l}  and 

Br+i(Bjj)  =  {By}. 


Here  it  is  immediate  that  the  left-hand  side  of  (23)  is  less  than  or  equal  to 

K(By)v(Bij)  £  A(Bjj) 

Case  3.  GCqCBy)  a  k  d  {i+l,...j-l}  and 

®r+l(®ij)  * 

In  this  case,  it  follows  that  for  some  t  e  {q+l,...,r}  we  have 


Ot(By)  =  k'  e  {i+l,...j-l} 

(27) 

a,(By)  d  {i+l,...j-l}  for  s  =  n,...,t-l; 

(28) 

and  thus 

Z  Z  «Xx)w{x,5(B)]  -  f  £  n^JcloqCB')] 

B  e  Bj.ifB*)  x  c  B  t=q  B'  e  B^By) 
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=  X  I  4KxXo[x.S(B)]  -  £  ittBjWo.tB,,)] 

B  £  Br>1(B,j)  x  £  B  s=q 

-  £  I  *(B')c[a1(B')] 

*=t  B'  £  B,(Bij) 

£  Z  I  <Kx)o>[x,8(B)]  -  £  £  n(B')c[aI(B)]; 

B  £  BI+i(B|J)  x  £  B  t=t  B'  e  B,(B()) 

and  it  follows  from  our  analysis  of  the  preceding  cases  that  the  right-hand  side  of  (28)  is 
less  than  or  equal  to  A(Bjj). 

□ 


3.2.2.  Corollary.  The  strategy  a*  =  <(E * j ,a* j  ),...,(B*pa*r),(B*r+1,6*)>,  defined 
in  (17)-(22),  above,  is  optimal  for  D. 

Proof.  It  is  an  immediate  consequence  of  our  definition  of  a*  (in  particular  of  our 
definition  of  a*j(X)  and  8*)  that: 

fl(o*)  -  Ho*)  =  A(B0lH.|)  =  A(X). 

Consequently,  it  follows  from  Theorem  3.2.1.  that  o*  is  optimal  for  D. 

□ 


We  now  continue  the  example  discussed  at  the  beginning  of  this  section,  which 
illustrates  how  the  payoff  function  affects  the  optimal  information  structure.  Since  the 
utility  functions  can  be  represented  by  the  index  value  where  they  are  maximized,  we 
will  define  U  =  W  =  [0,20].  Since  U  =  W,  h(  )  =  h*(  ),  and  we  define  h*(  )  by 


1/30 

V 

> 

VI 

o 

1.5 

3/50 

1.5  £  w 

<4 

2/145 

4  £  w  < 

11.25 

1/17 

11.25  £ 

w  <  19.75 

4/5 

19.75  £ 

8 

VI 

* 

and  let  W  =  [1,2,6,19.5,20]. 

We  must  now  generate  an  information  and  decision  strategy  that  is  optimal  with  respect 
to  our  payoff  function.  The  decision  problem  can  be  defined  as 

D  =  <X>4),w*»A,{M,  |  a  e  A},c,r> 

where: 

X  =  {(-«»,0),[0, 1 .5),[  1 .5,4),[4,1 1 .25),[  1 1 .25,19.75),[  1 9.75,20], (20,~)} 

<MXo)  =  0,  0(Xi)  =  .O5,  4KX2)=-15,  <XX3)  =  -1 
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<t>(X4)  =  *5,  <Kx5)  =  -2,  <t>(X6>  =  0 
D  =  {1,2, 3, 4,5} 

A  =  {1,2, 3,4, 5};  i.e.,  experiment  j  consists  of  comparing  vu  with  v^  and  Vj 
c  =  1  and  r  =  5. 


When  we  use  the  payoff  function, 


co(x,d)  =  I 


©  >  0 
0 


if  vu  €  Xd 

otherwise 


the  resulting  information  structure  is  a  binary  tree  (Figure  1).  When 
co(x,d)  =  - 1  wd  -  Wj  |  is  used,  the  resulting  information  structure  consists  of  taking  a 
single  experiment  at  X3  =  [4,1 1.25).  If  the  optimal  value,  vu,  is  an  element  of  X3»  then 
we  choose  w3  =  6  as  the  best  available  element  (d  =  d3).  If  vu  is  less  than  X3>  we 
choose  w2  -  2  as  the  best  available  element,  and  if  vu  is  greater  than  X3  we  choose 
w4  =  19.5  as  the  best  available  element  (see  Appendix  B  for  the  detailed  calculations). 
In  this  case  the  expected  value  of  distinguishing  between  Xi  and  X2  and  between  X4  and 
X5  is  not  worth  the  expected  cost  of  doing  so.  It  is  important  to  note  that  complete 
information  was  not  obtained.  Computer  science  generally  assumes  that  if  the 
requested  record  is  in  the  file,  then  it  will  be  found  and  returned.  Using  the  decision 
theoretic  framework,  however,  that  is  not  the  case  for  either  the  common  index  prob¬ 
lem,  or  the  file  search  problem.  Complete  information  will  be  obtained  only  if  the 
expected  payoff  exceeds  the  expected  cost  of  doing  so.  It  is  also  not  necessarily  the 
case  that  the  optimal  algorithm  will  be  the  same  for  the  file  search  problem  as  the  com¬ 
mon  index  problem,  even  if  similar  payoff  functions  are  used. 


We  want  now  to  determine  the  conditions  under  which  complete  information  will 
be  obtained  for  each  payoff  function. 


3.2.3.  Definition:  We  say  that  a  strategy  o*  strictly  dominates  o  if 
Q(o*)  -  r«j*)  >  Q(o)  -  T(o). 


3.2.4.  Proposition:  Suppose  D  is  a  common  index  problem  with  the  payoff  func¬ 
tion 

f©  vu  €  Xd 

f>(x,d)  -  |g  otherwise,  , 

and  that  r  £  n  and  6©  >  c,  where:  we  define  6  >  0  and  c  by 

0  =  min{jt(B)  |  B  e  BA}  and  c  =  maxc(a)  =  c. 

•  €  A 
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Xa 


Xs 


Xi 


X3 


Figure  1.  Optimal  Binary  Search  Tree  for  the  Common  Index  Problem 


with  Payoff  Function  co(x,d)  = 


* 

w  >  0 
0 


if  vu  e  xd 


If  a  =  <(B1,a1),...,(Bpar),(Br+1,5)>  is  a  feasible  strategy  for  D  such  that  for  some 
B*  €  Br+1,  B*d  Ba,  then  o  is  strictly  dominated. 


Proof.  The  proof  of  this  can  proceed  by  an  argument  essentially  identical  to  the 
proof  of  Corollary  6.5.2  and  Proposition  6.5.3.  in  Moore  and  Whinston  [1987]. 


We  shall  prove  a  similar  result  for  the  common  index  problem  with  the  alternative 
payoff  function 


CD(x,d)  =  -|wd  _  WJ; 


but  it  will  be  convenient  to  first  establish  the  following  lemma. 

3.2.5.  Lemma.  Suppose  D  is  a  common  index  problem  with  the  payoff  function 


cu(x,d)  =  -|wd  -  wu| 


that 


and  that 


Pj  =  <KXj)  >0  for  i  =  I*.. a 
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o  —  <(Bi,a1),...,(BpCxr),(Br+i,8)> 


is  a  feasible  strategy  for  D,  then  given  any  B  €  Br+1, 

i.  there  exists  i  e  { l,...,n-l}  and  1  €  {l,...,n-i}  such  that  B*  =  {Xi>-»Xi+i)»  and 

ii.  if  8(B*)£  {i,...,i+l},  then  o  is  strictly  dominated. 


Proof.  Part  (i)  of  our  conclusion  follows  from  Theorem  6.2.3.  of  Moore  and 
Whinston  [1987].  To  prove  part  (ii),  we  distinguish  two  cases. 

a.  8(B*)  <.  i-1.  Here  if  we  define  a  new  strategy  o*  which  is  identical  to  o  except 
that  we  take 

8*(B*)  =  i, 


we  will  have;  writing  d  =  8(B*): 


fl(o*)  -  Ho*)  -  [fl(o)  -  T(o)] 


=  2  <Kx)w[x,8*(B*)]  -  £  <Kx)cd[x,8(B*)] 

x  e  B*  x  e  B* 


=  2  <Kx)[to(x,i)  -  to(x,d)] 
xc  B* 

I 

=  -2  Pi+kdWi+k  -  Wil  -  jw^k  -  wd|) 

k=0 

1 

=  "2  Pj+k(wi+k  -  Wj  -  wi+k  +  wd) 

k=0 

I  I 

=  -  2  PH-k(wd  -  Wi)  =  -(wd  -  Wj)£  pi+k 

k=0  k=0 

I 

=  |Wi  -  Wd|  2  pi+k  >  0. 
k=0 


b.  8(B*)£i+l+l.  A  similar  argument  will  suffice  for  this  case,  except  that  we 
here  obtain  o*  from  o  by  changing  8(B*)  to 

8*(B*)  *  i  + 1. 


□ 


Using  Lemma  3.2.5,  we  can  now  prove  the  following. 
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3.2.6.  Proposition:  Suppose  D  is  a  common  index  problem  with  the  payoff  func¬ 
tion 

0)(x,d)  = -|wd  -  wu| 

that  r  £  n,  and 

0  Min  Ifyn  -  wjl  >  c,  (29) 

where  0  >  0  and  c  are  defined  by 

0  =  min{7t(B)  |  B  e  BA)  =  min{5j|j  e  {l,...,n}  (30) 

and  c  =  max  c(a)  =  c. 

IE  A 

If  o  =  <(B1,a1),...,(Br,ar),(Br+1,6)>  is  a  feasible  strategy  for  D  such  that  for  some 
B*  e  Br+1,  B*£  Ba,  then  o  is  strictly  dominated. 

Proof.  As  noted  in  Lemma  3.2.5,  it  follows  from  the  fact  that  B*  e  Br+1  that  there 
exist  i  e  {l,...,n-l}  and  1  e  {l,...,i-n}  such  that 

B*  =  {Xiv*X»+|}*  (31) 

and,  since  we  also  have  B*^  BA,  it  follows  that 

I  2  2  (32) 

Furthermore,  since  r  £  n  and  B*d  BA,  it  is  easy  to  see  that  there  must  exist  some 
qe  {l,...,r}  such  that 

i[Pq(B*),<xq(Pq[B*])]  =  (3q(B*)}. 

If  q  <  r,  and  we  define  a  new  information-gathering  strategy 

a'  =  <  (B  I'.cq  ,(Br',a/)> 


by  letting: 

(B/.cqO  = 

(B^cq)  for  t  =  l,...,q-l, 

and  for  1 2:  q: 

ott(B) 

for  B  e  Bt\pt(B*) 

a,'(B)  =  • 

Ot+i(B) 

for  B  =  Pt+i(B*)  and  t  <  r* 

0 

for  B  =  |3r+i(B*)  =  B*  and  t  =  r, 

t  Notice  that  =  pq(B*),  so  that  this  function  is  well-defined. 
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it  is  clear  that 


®r+l  =  ®r+l* 


and 

T(aO  £  Hoc). 

From  these  considerations,  and  from  part  (ii)  of  Lemma  3.2.5,  we  can  see  that  it  suffices 
to  prove  our  result  for  the  case  where 

B*  e  Bp  ctj(B*)  =  0,  (33) 

and 

8(B*)  e  {i,...,i+l}  (34) 


[see  (31),  above]. 

Now,  we  are  going  to  prove  that  o  is  strictly  dominated  by  defining  a  new  strategy 
a*  which  differs  from  a  only  in  that  we  will  set 

Or*(B*)  e  {i,..,i+l}, 

and  we  will  re-define  8()  on  i[B*,ar(B*)].  Notice  that  for  any  such  altered  strategy, 
a*,  we  will  have 

fl(o*)  -  Ho*)  -  [Q(o)  -  r(o)]  (35) 

=  £  £  <Kx)a>[x,5*(B)]  -  n(B*)c[ar*(B*)] 

B  €  i[B\a,*(B*)]  x  e  B 

X  <j)(x)0)[x,5(B*)]. 

xe  B* 

=  £  £  <t>(x)(o>[x,8*(B)]  -  co(x,8(B*)))  -  Jt(B*)c[ar*(B*)] 

B  €  IIB*^*^*)]  x  €  B 


Defining 


i*  =  8(B*), 


we  now  distinguish  two  cases. 

1.  i*  =  1.  Here  if  we  set 

a*r(B*)  =  i  +  1  -  1  =  i*  + 1  -  1, 


we  will  have 

i[B*,a*r(B*)]  = 
where  we  follow  the  convention  that 
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{Xi,...,Xi+i-2>  =0  if  1*1- 


If  we  then  define  8*(-)  on  ilB*,a*,(B*)]  by: 


S*(B)  = 


if  Be  {{Xi . Xw-2>»{Xi+»-i}} 

if  B  =  (Xi+i), 


equation  (35)  becomes: 

Q(o*)  -  r(o*)  -  [Q(o)  -  T(o)] 

=  ~Pi+i(°  -  I  wi+i  “  wjl)  -  Jt(B*)c(i  +  I) 

=  Ph-i(wi+i  “  wi)  “  *(B*)c(i  +  1) 

^  Pi+i(wi+i  “  wi)  "  »(B*)c(i  +  1) 

£  0Min  |  wj+1  -  Wj  |  -  c(i  +  1)  >  0, 

where  the  last  two  inequalities  are  by  (30)  and  (29),  respectively.  Thus  o  is  strictly 
dominated. 


2.  i*  £  i  +  1.  Here  if  we  set 

«*(B*)«i+l, 

and  define  8*0)  on  i[B*,ar*(B*)]  by: 

fi  for  B  =  (Xi) 

8*(B)=V  for  Be  KXmMXm. Xw» 

(where  again  we  follow  the  convention  of  letting  (Xj+2»— »Xi+i  =  0  for  1  =  1),  we  can 
show  by  an  argument  similar  to  that  used  in  case  1  that 

Q (a*)  -  T(o*)  -  [0(c)  -  r(o)]  >  0. 

□ 


For  both  payoff  functions,  the  condition  for  complete  information  presented  in 
Propositions  3.2.4  and  3.2.5  is  sufficient,  but  not  necessary.  In  the  future,  we  hope  to 
find  either  necessary  conditions  or  at  least  tighter  sufficient  conditions. 
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4.  Conclusion 

We  have  found  and  characterized  a  solution  for  a  version  of  the  common  index 
problem.  There  are  other  ways  in  which  one  might  formalize  the  problem,  and  we 
intend  to  explore  some  of  these  in  later  work.  Our  present  view,  however,  is  that  the 
principle  weakness  in  the  applicability  of  the  solution  presented  here  lies  in  our 
assumption  that  the  preferences  of  the  ith  user  can  be  represented  as  a  composite  func¬ 
tion 

ui(s)  =  fi‘(l7(o)-vi|), 


where  y.S  -» W  is  the  same  index  for  all  users.  On  the  other  hand,  psychological 
researchers  have  appeared  to  obtain  good  results  with  this  sort  of  model  in  a  fairly  wide 
variety  of  contexts,  but  at  the  same  time  it  is  clear  that  this  is  a  restrictive  assumption; 
particularly  in  that  it  is  necessary  for  applications  that  the  y(  )  function  be  known  tup  to 
similarity  transform),  or  at  least  that  its  values  at  the  points  in  the  file  be  known.  0  We 
are  currently  investigating  the  question  of  whether  and  how  this  assumption  can  be 
weakened. 


t 
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Appendix  A 
Notation 

T  Universal  set  of  possible  elements  from  which  to  choose 
S  S  c  T.  Set  of  available  alternatives. 

U  Set  of  possible  users. 

u  u  6  U  utility  function  representing  a  specific  user. 

vu  The  index  of  ideal  elements  for  user  u. 
h(  )  Probability  distribution  over  U. 

W  Common  index  over  T. 

y  y^  -*  Y  Function  that  assigns  an  index  to  each  element  of  T. 

W  (w  €  W  |  w  =  y(S),s  e  S)  set  of  indices  of  the  available  elements. 

h*(  )  Probability  distribution  on  W.  Induced  by  h('). 

X  Set  of  possible  mutually  exclusive  states. 

<J>  Probability  density  function  over  X. 

re  Probability  distribution  over  the  power  set  of  X. 

D  Set  of  final  decisions. 

(0*  Payoff  function. 

A  Set  of  initial  experiments. 

M„  Information  structure  associated  with  a  e  A. 
c(a)  Cost  of  using  action  a. 

r  Number  of  experiments  that  can  be  taken. 

B  Partition  of  X. 

B  element  of  B. 

t(B,A)  Partition  of  B  induced  by  experiment  a. 
a(t,B)  Action  taken  on  the  set  B  of  time  t 

£2*  Expected  payoff  for  a  search  strategy  (social  welfare  function). 

Q(o)  Expected  gross  payoff  for  the  strategy  a. 

r(a)  Expected  cost  of  the  strategy  a. 

C(B)  Cost  of  a  path  resulting  in  B. 

V(B)  Potential  gross  payoff  associated  with  B. 

D*(B)  Conditionally  optimal  decision  set  for  B. 

BA(d)  Set  of  elements  in  BA  for  which  d  is  the  optimal  decision. 

Xd  Union  of  the  elements  in  BA(d)  for  a  given  d. 

\|/(B)  Max{7t(BpjXd  |  d  €  D}  for  B  c  X. 

Ba  The  finest  possible  partition  of  X. 
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Appendix  B 


(a(x4)  =  -|wj-wd| 


AMq2  =  0 

ABq2  -  0 

am12  =  0 

AB  i3  =  0 

AM 22  =  0 

AB24  =  0 

am32  =  0 

AB35  —  0 

am42  =  0 

ab46  =  o 

am52  =  0 

am62  =  o 

f{0) 

f(l) 

f(2) 

f(3) 

ff4) 

f(5) 

ABqj  =  —0.05 

-0.05 

-0.2 

-0.2 

AB14  =  -0.2 

-0.2 

-0.25 

-0.25 

AB25  =  -0.6 

-0.675 

-0.6 

-0.6 

AB36  =  -0.025 

-0.025 

-0.7 

-0.7 

AB04  —  —0.3 

-0.45 

-0.5 

-0.3 

-0.35 

AB15  =  -0.75 

-1.55 

-1.35 

-0.75 

-0.95 

AB26  =  -0.725 

-0.725 

-0.825 

-0.8 

-1.4 

AB05  =  —0.85 

-4.9 

-1.55 

-1.4 

-0.85 

-1.1 

AB16  =  -0.975 

-3.05 

-1.675 

-0.975 

-1.15 

-1.7 

ABog  =  -1.075 

-5 

-1.975 

-1.725 

-1.075 

-1.3 

-1.8: 

Figure  2.  Details  of  the  Dynamic  Programming  Solution 
for  co(x,d)  =  - 1  wj  -  | 
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* 

Jw  tf  vu  €  Xd 
<o(x,d)  *=  1  ©  otherwise 


AMoj  =  0 

ABq2  =  5 

AM12  =  5 

AB13  =  15 

AMm=  15 

AB24  —  10 

AM32  =  10 

AB35  =  50 

AM42  =  50 

AB46  —  20 

AM52  =  20 

^Hi+n  =  0 

m_  joi 


ABq3  —  19.8 

3 

19.8 

AB14  =  24.75 

3.75 

ABm  =  59.4 

30 

AB36  =  69.3 

35 

ABm  =  29.7 

4.5 

25.45 

AB15  =  74.25 

37.5 

AB26  =  79.2 

40 

ABqs  =  79 

40 

78.45 

AB16  =  93.8 

47.5 

ABog  =  98.7 

50 

97.8 

f(2) 

f(3) 

f(4) 

f(5) 

19.8 

24.75 

24.75 

59.4 

59.4 

69.3 

69.3 

29.7 

29.5 

73.65 

74.25 

74 

78.5 

79.2 

78.6 

78.6 

79 

78.9 

93.25 

93.35 

93.8 

93.3 

98.2 

98.1 

98.7 

98 

Figure  3.  Details  of  the  Dynamic  Programming  Solution  for 


if  vu  e  x«j 

otherwise 
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Comments  on  Multiple  Bifurcations 

John  Guckenheimer 
Cornell  University 

This  lecture  is  a  summary  of  several  problems  involving 
bifurcations  in  systems  of  ordinary  differential  equations  that  depend 
upon  several  parameters.  All  mathematical  details  of  the  problems 
discussed  here  appear  in  papers  listed  in  the  bibliography.  The 
discussion  here  is  concerned  with  the  philosophical  viewpoint  and 
practical  significance  of  this  type  of  study. 

Many  applications  of  mathematics  involve  solving  systems  of 
ordinary  differential  equations  that  depend  upon  several  parameters. 
Although  there  are  many  well  established  algorithms  for  numerically 
integrating  individual  solutions  of  such  systems,  achieving  a  qualitative 
understanding  of  the  dynamics  of  systems  of  moderate  size  and  how 
these  dynamics  vary  with  parameters  is  still  a  formidable  task.  Even 
for  simple  models  such  as  the  Henon  mapping  and  forced  oscillators  with 
one  degree  freedom  [11],  the  details  associated  with  chaotic  behavior  in 
systems  with  nonhyperbolic  ncnwandering  sets  still  eludes  us.  Going 
further  to  describe  the  details  of  bifurcations  in  these  systems  is  still 
more  complex,  and  it  is  presumptuous  to  expect  answers  that  are 
mathematically  complete.  The  recent  work  of  Chenciner  [1]  is  a  good 
benchmark  for  the  level  of  detail  that  can  be  achieved  at  this  time  in 
studies  of  bifurcations  which  involve  chaotic  and  quasiper iodic  behavior. 

By  definition,  bifurcations  are  qualitative  changes  in  the  dynamics 
of  a  system  which  occur  as  parameters  are  varied.  A  simple  example  is 
the  Hopf  bifurcation  which  occurs  as  a  family  of  periodic  orbits  emerge 
from  an  equilibrium  which  has  a  simple  pair  of  purely  imaginary 
eigenvalues.  The  approach  to  bifurcation  theory  adopted  here  relies  upon 
a  progressive  classification  of  more  and  more  complicated  bifurcations 
which  occur  robustly  in  families  of  increasing  numbers  of  parameters. 
The  viewpoint  is  one  of  looking  for  generic  or  typical  cases  which  are 
expected  to  occur  in  a  wide  variety  of  examples.  The  mathematical 
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techniques  trace  their  origins  to  transversal ity  theorems.  The  search 
for  simplicity  in  the  typical  bifurcations  and  their  classification  is 
complicated  by  the  presence  of  symmetry.  Many  physical  systems  are  at 
least  approximately  symmetric  with  respect  to  various  symmetry  groups, 
and  this  approach  to  bifurcation  theory  requires  that  careful,  explicit 
consideration  of  the  symmetry  be  built  in  from  the  beginning.  The 
complications  inherent  in  the  presence  of  symmetry  increase  the 
mathematical  richness  of  the  subject. 

From  a  practical  point  of  view,  bifurcation  problems  involving 
degenerate  equilibria  are  approached  in  the  following  manner.  One  has  a 
system  of  differential  equations  with  an  equilibrium  point  at  which  the 
linearization  has  zero  or  purely  imaginary  eigenvalues.  In  the  presence 
of  a  symmetry  group  for  the  system  of  equations,  the  symmetry  may 
force  the  eigenvalues  to  have  high  multiplicity.  An  initial  classification 
of  bifurcations  is  made,  based  upon  the  structure  of  the  generalized 
eigenspaces  associated  to  zero  and  purely  imaginary  eigenvalues.  Having 
determined  this  information,  one  proceeds  to  calculate  normal  forms  for 
biifurcations  with  the  specified  type  of  linear  part.  This  entails  an 
algebraic  calculation  in  which  the  Taylor  expansion  of  the  equilibrium  is 
transformed  by  near  identity  coordinate  transformations  into  one  for 
which  there  are  as  few  nonzero  terms  in  the  Taylor  expansion  as 
possible.  In  the  case  of  a  system  with  a  symmetry,  one  can  insist  that 
the  coordinate  transformations  respect  the  symmetry.  Different 
bifurcations  correspond  to  the  vanishing  of  terms  or  expressions  in  the 
coefficients  of  the  normal  forms  which  play  an  essential  role  in 
determining  qualitative  features  of  the  dynamics.  Finally,  parameters 
are  introduced  to  the  norma!  forms  in  a  way  that  hopefully  produces  a 
family  of  equations  whose  main  dynamical  features  remain  qualitatively 
unchanged  if  the  family  is  perturbed. 

The  normal  form  equations  are  closely  related  to  the  reduced 
bifurcation  equations  that  are  often  obtained  by  introducing  asymptotic 
expansions  at  a  bifurcation.  In  either  approach,  one  is  faced  with  the 
problem  of  solving  systems  of  differential  equations  with  polynomial 
right  hand  sides.  This  is  a  notoriously  difficult  mathematical  problem  in 
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general,  and  each  example  requires  individual  treatment  at  this  time. 
Beyond  the  calculation  of  the  location  and  stability  of  the  equilibria, 
further  analysis  of  the  dynamics  has  relied  upon  either  numerical  data  or 
perturbation  arguments.  In  many  multiple  bifurcation  problems,  it  has 
been  possible  to  analyze  the  occurrence  of  periodic  orbits  and  invariant 
tori  via  perturbation  methods  that  begin  with  systems  which  have 
explicit  analytic  integrals.  By  appropriately  scaling  the  variables, 
coefficients  and  parameters  of  the  normal  form,  one  can  seek  to  obtain  a 
limit  in  which  the  system  does  have  explicit  integrals.  For  two 
dimensional  systems  (or  ones  which  can  be  reduced  to  an  analysis  of  two 
dimensional  systems)  with  trajectories  lying  in  algebraic  curves,  the 
perturbation  arguments  involve  the  study  of  abelian  integrals  and  how 
these  depend  upon  the  parameters  of  the  system.  These  arguments 
establish  a  cormnection  between  these  problems  and  algebraic  geometry. 

Following  the  analysis  of  the  dynamics  of  the  normal  form 
equations,  one  still  has  an  additional  step  that  is  needed  to  complete  the 
mathematical  study  of  particular  bifurcations.  This  final  step  involves 
defining  an  appropriate  equivalence  relation  between  families  of  vector 
fields  and  showing  that  the  normal  form  is  stable  with  respect  to  this 
equivalence  relation;  i.e.,  perturbations  of  the  family  lie  in  the  same 
equivalence  class.  This  aspect  of  the  work  involves  verifying  that  the 
computation  of  the  normal  forms  to  higher  degree  produces  systems 
which  do  not  differ  qualitatively  from  the  normal  form  which  has  been 
analyzed  in  detail.  There  is  a  subtle  and  difficult  point  frequently 
encountered  here  in  that  "flat"  terms  with  zero  Taylor  expansions  may 
play  an  important  role  in  determining  the  dynamics  of  a  system  which 
has  been  formally  reduced  to  its  normal  form.  Perturbations  methods 
that  successfully  cope  with  these  flat  terms  are  lacking  in  the  theory. 
The  issue  of  choosing  an  equivalence  relation  in  terms  of  which  stability 
of  a  family  will  be  determined  is  also  an  awkward  and  messy  issue  to 
deal  with  in  many  of  the  examples. 

The  analysis  of  codimension  two  bifurcations  by  the  methods 
outlined  above  has  been  quite  successful.  Corresponding  to  the  three 
cases  of  a  two  dimensional  generalized  eigenspace  for  zero,  a  zero  and  a 
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pair  of  purely  imaginary  eigenvalues,  and  two  pairs  of  purely  imaginary 
eigenvalues  with  an  irrational  ratio,  there  are  analyses  of  two  parameter 
families  of  vector  fields  which  are  reasonably  complete  apart  from  the 
difficulties  mentioned  in  the  previous  paragraph  [I  l].  Recent  analytic 
work  has  sought  to  extend  these  analyses,  to  higher  codimension 
bifurcations  which  involve  degeneracies  due  to  the  vanishing  of  certain 
nonlinear  terms  in  the  normal  forms  [4,  6).  Perhaps  the  most  intricate 
problem  of  this  type  which  has  been  examined  thus  far  is  a  four 
parameter  family  of  two  dimensional  vector  fields  studied  by 
Dangelmayr  and  Guckenheimer  [4],  The  system  they  studied  is 

x  =  y 

y  =  -(x3  ♦  X,x2  ♦  X2x  *  X3)  ♦  y(X4  -  x2) . 

One  motivation  for  studying  this  system  is  that  it  represents  the  effect 
of  symmetry  destroying  perturbations  on  a  codimension  two  bifurcation 
within  the  class  of  systems  that  are  symmetric  with  respect  to  rotation 
by  7T  in  the  plane.  In  this  work  as  well  as  that  of  other  groups  who  have 
studied  codimension  three  bifurcations  of  two  dimensional  vector  fields, 
it  is  necessary  to  examine  "global"  codimension  two  bifurcations  which 
entail  the  exitence  of  homoclinic  or  heteroclinic  orbits  as  part  of  their 
degeiei'acy. 

A  second  area  in  which  there  has  been  extensive  recent  activity 
extending  the  analysis  of  codimension  bifurcations  involves  degenerate 
bifurcations  in  the  presence  of  0(2)  as  symmetry  group.  Here  0(2)  is  the 
group  of  2*2  orthogonal  matrices,  including  the  reflections  that  reverse 
orientation.  There  are  many  different  cases  of  codimension  2 
bifurcation  of  equilibria  in  the  context  of  systems  with  0(2)  symmetry. 
This  is  due  to  the  variety  of  representations  of  0(2)  which  produce 
different  combinations  of  purely  imaginary  and  zero  eigenvalues.  A 
vigorous  effort  which  includes  studies  of  Chossat  et  al  [2,3],  Dangeimayr 
and  Knobloch  [51,  Golubitsky  et  al.  [7,81,  Guckenheimer  [101,  Keyfitz  et  al. 
[12],  and  Knobloch  [13]  has  extended  the  initial  analysis  by  Ruelle  of  Hopf 
bifurcation  in  the  presence  of  0(2)  symmetry  to  many  of  the 
codimension  2  cases  which  occur.  There  are  applications  of  these 
results  to  a  number  of  systems,  perhaps  the  most  notable  being 
Taylor-Couette  flow  and  the  study  of  oscillatory  behavior  in  systems 
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which  are  homogeneous  in  one  spatial  dimension  with  periodic  boundary 
conditions.  The  volume  edited  by  Golubitsky  and  Guckenheimer  [7] 
provides  a  thorough  overview  of  work  in  this  area  through  the  summer  of 
1985,  but  additional  work  on  problems  of  this  type  has  been  appearing 
since  then  at  a  rapid  rate. 

The  complexity  of  recent  studies  of  multiple  bifurcation 
phenomena  prompts  a  reconsideration  of  the  utility  of  the  results  for 
applications.  One  motivation  for  the  study  of  multiple  bifurcations  is 
that  they  represent  an  entree  into  parameter  space  regimes  in  which 
complicated  dynamical  behavior  occurs.  Without  integration  of  a  system 
of  equations,  the  location  of  parameter  values  at  which  multiple 
bifurcations  occur  identifies  regions  in  which  complicated  dynamics 
appears.  This  is  thoroughly  nontrivial  for  applications  like  the  dynamics 
of  chemical  reactors  in  which  the  parameter  space  of  a  system  has  a 
high  dimension  that  cannot  be  searched  with  fine  resolution  in  numerical 
studies.  It  is  also  important  for  problems  of  fluid  dynamics  in  which 
numerical  integration  of  the  underlying  equations  requires  enormous 
computational  resources  and  is  therefore  expensive  when  it  is  feasible 
at  all.  A  difficulty  which  is  encountered  in  such  problems  is  that  more 
degenerate  bifurcations  and  larger  symmetry  groups  entail  more  nonzero 
coefficients  in  normal  forms.  For  application  to  problems  in  which  one 
is  confident  of  the  validity  of  equations  describing  a  mathematical 
model,  one  finds  extensive  algebraic  manipulations  are  required  to 
compute  the  coefficients  of  the  normal  forms  associated  with  the  model. 
These  can  be  performed  in  some  cases  with  symbolic  computational 
systems  such  as  MACSYMA,  SMP,  MAPLE,  SCRATCHPAD  or  REDUCE,  but  the 
problems  quickly  strain  the  limits  of  this  technology.  In  other  cases,  a 
valid  mathematical  mode!  for  experimental  data  is  not  readily  available 
and  one  must  try  to  estimate  or  fit  the  values  of  normal  form 
coefficients  to  data.  When  there  are  many  coefficients,  this  is  a 
formidable  task. 

A  second  difficulty  which  arises  in  applying  the  results  of  a 
multiple  bifurcation  analysis  of  a  high  codimension  bifurcation  to 
experimental  data  is  that  there  are  qualitative  features  that  seem  to 
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occur  in  very  small  parameter  regions  and  are  difficult  to  detect 
quantitatively.  This  observation  is  based  upon  numerical  computations 
in  which  it  is  difficult  to  resolve  subsidiary  bifurcations  that  occur 
close  to  one  another  in  a  multidimensional  parameter  space.  Experience 
seems  to  indicate  that  some  features  are  easily  missed  unless  one  works 
systematically,  piecing  together  complicated  stability  diagrams  from  a 
knowledge  of  the  stability  diagrams  of  lower  codimension  bifurcations 
which  occur  in  examples.  The  incidence  of  errors  in  drawing  stability 
diagrams  on  the  basis  of  numerical  computations  with  standard  floating 
point  calculations  has  been  quite  high.  Perhaps  one  should  not  be  overly 
concerned  with  these  difficulties  when  dealing  with  applications,  but  the 
goal  of  the  theory  is  to  develop  a  comprehensive  explanation  of 
complicated  dynamical  phenomena  found  in  the  mode  interactions 
associated  with  multiple  bifurcations.  Ignoring  aspects  of  the  problem 
that  are  quantitatively  insignificant  can  easily  lead  to  a  mathematically 
inconsistent  picture.  No  good  resolution  of  this  dilemna  has  appeared. 

It  suggests  that  considerable  caution  should  be  used  in  trying  to  use 
numerical  software  for  such  tasks  as  tracking  periodic  orbits  of  a 
dynamical  system  in  the  close  vicinity  of  highly  degenerate  bifurcations. 
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MEASURES  OF  BLOCK  DESIGN  EFFICIENCY  RECOVERING  INTERBLOCK  INFORMATION 
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ABSTRACT.  In  evaluating  goodness  of  a  class  of  designs,  researchers 
have  used  a  measure  of  design  efficiency  proposed  by  F.  Yates  in  the 
thirties.  This  measure  considers  only  intrablock  information  and  does  not 
make  use  of  the  Information  contained  in  the  Interblock  variance.  The 
measures  of  efficiency  proposed  here  are  dependent  upon  the  ratio  of  the 
interblock  and  intrablock  components  of  variance,  i.e.,  o2/o2  ■  7  •  The 
efficiency  of  one  block  design  to  a  second  may  not  remainpinvariant  with 
respect  to  this  ratio.  Incomplete  block  designs  which  were  inefficient 
under  the  intrablock  measure,  now  become  quite  efficient  for  some  ratios  of 
y.  Likewise,  the  indications  are  that  Interblock  information  should  always 
be  recovered  when  analyzing  data  from  experiments  arranged  in  an  incomplete 
block  design. 

I.  INTRODUCTION.  In  the  mid-thirties  Yates  (e.g.  in  1937)  introduced 
an  efficiency  factor  for  partially  confounded  factorials  and  for  incomplete 
block  designs.  The  factor  is  computed  as  the  ratio  of  the  average  variance 
of  a  difference  between  two  adjusted  means  (or  for  factorial  effects)  to 
the  variance  of  a  difference  of  two  means  from  an  orthogonal  design  such  as 
a  completely  randomized  or  randomized  complete  block  design  assuming  no 
change  in  the  error  variance  o2  for  the  two  designs.  It  is  common 
practice  in  statistical  literature  to  present  this  efficiency  factor  for 
designs  and  to  discuss  optimality  of  classes  of  designs  in  terms  of  the 
Yates  efficiency  factor,  which  only  makes  use  of  the  intrablock  error 
variance.  No  use  is  made  of  the  information  contained  in  the  interblock 
variance  obtained  from  the  incomplete  blocks  (eliminating  treatment 
effects)  sum  of  squares.  A  more  proper  efficiency  factor  should  make  use 
of  the  information  contained  in  both  the  intrablock  and  the  interblock 
variances.  Some  measures  accomplishing  this  are  presented  in  this  paper. 
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II.  BALANCED  INCOMPLETE  BLOCK  DESIGN.  A  classical  balanced  incom¬ 
plete  block  design  (BIBD)  consists  of  v  treatments  arranged  in  b  Incomplete 
blocks  of  size  k,  k<v,  with  r  repetitions  of  each  treatment,  and  with  each 
and  every  pair  of  treatments  occurring  together  in  an  Incomplete  block  X 
times.  The  standard  relations  are  bk  ■  rv,  X  *  r(k-l ) /(v-1 ) ,  and 
e  •  (  1-k1  )/( 1-v*  )  -  v(k-l ) /k( v-I ) .  The  factor  e  is  the  Yates  efficiency 
factor.  The  usual  response  model  assumed  for  a  BIBD  is 


hij 


♦  "h  ♦  “hi 


♦  x( 


+  Chij)’ 


(1) 


where  Y^^j  is  the  response  for  the  jth  treatment  in  the  ith  Incomplete 
block  in  the  hth  complete  block,  h*l,...,r;  i»l,...b/r;  njjjj  is 

one  if  the  jth  treatment  occurs  in  the  hith  incomplete  block  and  zero 
otherwise;  p  is  a  general  mean  effect;  pj,  is  the  hth  complete  block  effect; 
Bhi  Is  the  hith  random  Incomplete  block  effect  distributed  with  mean  zero 
and  common  variance  <Tg  ,  Xj  is  the  jth  treatment  effect,  and  are 

random  error  effects  which  were  distributed  with  mean  zero  and  variance 

2 

oe.  An  analysis  of  variance  is  given  in  Table  1. 


TABLE  1.  Analysis  of  variance  for  a  resolvable  BIBD. 


Source  Degree 

of  variation  of  freedom 


Total  bk  ■  vr 

Correction  for  mean  1 

Treatment  (ignoring  incomplete 
block  effects)  v-I 

Within  treatments  v(r-l:) 

Blocks  (eliminating  b-1 

treatment  effects) 

Complete  blocks  r-1 


Incomplete  blocks  (elim.  tr.)  b-r 


Expected  value 
of  mean  square 


bk-v 


b-1 


a 2  +  ksl 


Intrablock  error 


vr-v-b+ 1 


a2 

e 


'Expected  mean  square  for 
Intrablock  information 
Interblock  information 


Ph  ■  0,  i.e.,  no  complete  block  effects. 

«  -  1 /b| 

w'  -  l/(o|  +  ktr|) 
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For  lntrablock  contrasts  the  variance  of  a  difference  between  fwn 
estimated  treatment  effects  Is  2o£/re,  when  e«( 1  -  1 / k) / ( I  -  I /v ) .  >i 
Interblock  contrasts  the  variance  of  a  difference  between  i  wo 
effects  la 


2(o|  +  kog) 
r(l-e) 


(2) 


For  the  combined  estimator  of  a  difference  between  two  treatment  effects 
the  variance  la 


where  y  • 


1  -  e 


o*  ♦  ko* 
c  B 


1  +  k  y 
1  +  k  e  y 


t 


(3) 


Since  the  lntrablock  contrast  variance  Is  of  the  form  2o|/re,  It 
would  be  logical  to  have  the  combined  estimator  variance  In  the  same  form 
l.e.,  2o|/re*,  where 


1  ♦  k  e  y 
1  +  ky 


k(l-e) 


1  +  k  y 


(4) 


A  second  measure  not  involving  e  is 


*  1  +  k  y  1 

.  -  -  -  1 - (5) 

1  ♦  <k+l)y  k  +  1  ♦  1/y 

2  2 

The  latter  measure  of  efficiency  depends  only  upon  y  ■  0o/at  and  k; 
note  that  (4)  is  also  a  function  of  k  and  y  since  e»k/(k+l)  for  v«k*  and 
r«k+ 1 .  A  comparison  of  the  two  measures  is  given  in  Table  2  for  k"3,7,  and 
11.  There  is  little  to  choose  between  ei  and  e*  and  it  is  suggested 
that  ef  be  used  as  a  measure  of  efficiency  rather  than  e£.  Note  that 
as  y  approaches  zero  the  efficiency  for  all  k  approaches  unity.  When  y 
approaches  Infinity,  the  Yates  efficiency  factor  e  is  approached  for  all  k. 
For  small  k,  e  is  relatively  low  Indicating  an  inefficient  design.  How¬ 
ever,  ef  Indicates  that  designs  with  small  k  are  quite  efficient  If  y  is 
1/4  to  1/16,  say.  From  these  results,  it  is  suggested  that  Interblock 
information  should  always  be  recovered  and  that  inefficiency  of  incomplete 
block  design  is  not  a  problem. 
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TABLE  2.  Intrablock  -  Interblock  efficiencies  for  various  values 

of  y  for  k  ■  3,  7,  and  11. 


k 

3  7  11 


r 

1  e* 

•? 

f 

ef 

•f 

*! 

0 

1 

|  1 

1 

1 

1 

1 

1 

1/32 

1 

1  .98 

.97 

.98 

.98 

.98 

.98 

1/16 

1 

1  .96 

.95 

.96 

.96 

.97 

.96 

1/4 

1 

1  .89 

.88 

.92 

.92 

.94 

.94 

1/2 

1 

1  .85 

.83 

.90 

.90 

.93 

.93 

1 

1 

1  .81 

.80 

.89 

.89 

.92 

.92 

2 

1 

1  -79 

.78 

.88 

.88 

.92 

.92 

4 

1 

1  -77 

.76 

.88 

.88 

.92 

.92 

03 

1 

I  -75 

.75 

.875 

.875 

.917 

.917 

e*  - 

1  +  ke  r 

1  + 

k  y 

_  _ 

1  +  k  y 

• 

2 

1  + 

(k+i)r 

For  a  randomized  complete  block  design,  the  variance  of  a  difference 
between  two  arithmetic  means  Is 


(6) 


whereas  the  variance  of  a  difference  between  two  adjusted  treatment  means 
from  a  BIBD  is 


2o2 

_ e 


re. 


(7) 
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Now  (6)  i.  (7),  their  difference  being 


v  -  k 

1  +  -  r 

v  -  1 


l  +  k  r 


1  +  ke  r 


v(v-k)(k-l )  r2 


(8) 


(v-l)[v-l  +  v(k-l)  r) 


(8)  is  zero  when  y  -  0  and/or  v  ■  k.  Equation  (8)  could  be  another  measure 
of  intrablock  '  interblock  efficiency.  Perhaps  a  more  appropriate  measure 
would  be  a  ratio  rather  a  difference  to  obtain 


e 


* 

3 


1/e*  (l  ♦ 


(v-k)y 

v-1 


(9) 


JL 

The  measure  03  would  conform  more  to  the  definition  of  efficiency  origin¬ 
ally  presented  by  Yates  but  would  include  both  intrablock  and  Interblock 
information. 


III.  OTHER  BLOCK  DESIGNS.  A  class  of  generalized  N-ary  designs  were 
discussed  by  Shaflq  and  Federer  (1979,  1983).  For  these  designs  the 
response  model  equation  (1)  is  replaced  by 


Yghij  “  U  ♦  Ph  +  *hi  +  Tj  +  eghij  » 


(10) 


where  g  •  0,..,nhij  and  when  nj,jj  -  0  there  is  no  response  Ygj,ij.  The 
other  symbols  are  as  defined  in  (1).  The  above  authors  generalized  the 
Yates  efficiency  factor  for  this  class  of  designs  and  they  also  proved 
that  the  Fisher  Inequality  vSb  holds  for  this  general  class  of  balanced 
block  designs.  The  efficiency  factor  ef  for  the  generalized  balanced 
block  design  is 

e*  -  1 - C  ~  X  . -  ,  (11) 

1  r(k  +  r  ) 


where  c  •  E  E 
block  hi,  and  ^r 


nhij  » 

-EE 
h  i 


n^^j  is  the  number  of  times  treatment  j  occurs  in 
n^ij  is  the  number  of  replicates  for  treatment  j. 


For  the  class  of  resolvable  incomplete  block  designs  known  as  lat¬ 
tices,  the  average  variance  of  a  difference  between  two  adjusted  treatment 
means  is 

2  1  r  k-r+1 


k+1  '  (r-l)u>  +  w'  rio 
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Using  the  average  variance  of  a  difference  between  two  adjusted 
treatment  effects,  we  could  construct  e*  and  e*  for  any  class  of  in¬ 
complete  block  designs.  The  ideas  in  this  paper  stay  be  used  to  construct 
efficiency  factors  similar  to  e*  and  e^  for  cubic  lattices,  for  lattice 
squares,  and  for  other  designs. 
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ABSTRACT.  This  paper  introduces  an  easy-to-implement ,  quadratically  con¬ 
strained,  optimization  algorithm  that  is  particularly  suited  to  computing 
asymptotic  confidence  bands  for  many  nonlinear  models.  Convergence  of 
this  algorithm  is  proved  and  several  applications  are  discussed.  These 
applications  include:  nonlinear  regression,  multinomial  logistic 
regression,  and  covariate-dependent  reliability  models. 

1.  INTRODUCTION.  In  nonlinear  regression,  investigators  are  often 
Interested  in  the  underlying  form  of  the  response  curve.  As  such, 
simultaneous  confidence  bands  are  particularly  useful  in  nonlinear 
regression.  If  a  closed  confidence  region  exists  for  the  regression 
parameters,  then  in  theory  an  approximate  confidence  band  can  be  obtained 
for  the  nonlinear  response  function.  This  is  achieved  by  minimizing  and 
maximizing  the  response  function  over  the  confidence  region  for  various 
values  of  the  predictor  variables.  Such  confidence  bands  are  generally 
conservative,  but  close  to  nominal  value  if  the  predictor  variable  can 
take  on  a  considerable  range  of  values.  Procedures  for  obtaining  approxi¬ 
mate  confidence  regions  that  are  closed  and  have  a  quadratic  form  can  be 
found  in  Bates  and  Watts  (1981)  and  Hamilton,  Bates  and  Watts  (1982). 
However,  even  with  this  simplification,  construction  of  the  confidence 
band  generally  requires  solving  many  pairs  of  nonlinear  programming 
problems.  In  fact,  in  one  of  the  very  few  papers  on  confidence  bands  in 
nonlinear  regression,  Khorasani  and  Milliken  (1982)  only  describe  the 
general  process  as  a  "computational  tedium"  and  go  on  to  discuss  two 
special  cases  where  the  confidence  bands  have  a  closed  functional  form. 

Indeed,  consideration  of  general,  nonlinear  optimization  techniques 
(for  nonlinear  constraints)  may  discourage  practioners  form  attempting  to 
compute  confidence  bands  for  nonlinear  regression  models.  Nonlinearly 
constrained  optimization  procedures  that  seem  to  be  available  in  the 
literature  can  be  classified  into  Lagrangian^penalty  function  approaches 
(Bertsekas,  1982)  or  feasible  direction  approaches  (Ben-Isreal,  Ben-Tel, 
Zlobe,  1981).  These  procedures  are  for  quite  general  problems,  and  may 
require  substantial  calculations  to  complete  one  iteration.  This  can  be 
undesirable  since  the  confidence  band  requires  the  solution  of  many  pairs 
of  nonlinear  optimization  problems.  Furthermore,  these  algorithms  are 
considerably  more  difficult  to  implement  in  practice  than  the  widely 
available  procedures  used  to  estimate  the  parameters  in  nonlinear 
regression.  However,  by  exploiting  two  properties  special  to  most 
regression  inference  situations,  it  is  possible  to  compute  approximate 
confidence  bands  for  nonlinear  regression  models  in  a  relatively  easy  and 
efficient  manner. 
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In  this  paper  an  algorithm  is  presented  for  computing  confidence 
bands  for  nonlinear  regression  models  using  a  quadratic  confidence  region 
for  the  regression  coefficients.  This  algorithm  is  especially  easy  to 
implement  if  matrix  oriented  software  is  used  such  as  PROC  MATRIX  (SAS 
Institute,  Inc,  1985a),  IML  (SAS  Institute,  Inc.  1985b),  or  APL. 

In  the  following  sections,  an  algorithm  for  computing  asymptotic 
confidence  bands  is  presented  and  investigated.  In  section  2,  the 
problem  is  formulated.  In  section  3,  the  algorithm  is  derived  and  a 
stepsize  modification  is  briefly  discussed.  A  convergence  theorem  is 
proved  in  section  4.  Also  in  section  4,  a  geometric  justification  is 
given  for  the  algorithm's  convergence  properties.  Finally,  in  section  5, 
applications  of  the  algorithm  to  other  nonlinear  simultaneous  estimation 
problems  are  discussed. 

2.  PROBLEM  STATEMENT.  In  this  paper  we  define  the  nonlinear  regression 
model  as, 


yt  -  f(xt,fl)  +  e t , 

where  the  et's  are  independent,  normally  distributed  random  errors  with 
zero  mean  and  variance  a?  for  all  xt  values  (t  ■  l,...,n).  The 
xt's  are  (possibly  vector  valued)  predictor  variables,  and  0  is  a  pxl 
vector  of  unknown  model  parameters.  Here,  f(x,6)  is  a  known  function  of 
0  for  each  xcX,  where  X  is  the  set  of  all  possible  predictor  variable 
values.  For  notational  compactness  f(x,0)  will  sometimes  be  denoted  as 
simply  f(0). 

It  is  assumed  that  an  approximate  100  (l-a)%  confidence  region  for 
0  can  be  written  in  the  form 

%  -  {0:  (0-9)  ’  C  (0-0)  S  b<J, 

A 

where  0  is  the  least  squares  estimate  of  0,  C  is  a  pxp  positive- 
definite,  symmetric  matrix,  and  ba  is  a  (boundary)  value  that  is 
determined  by  a.  Typically,  ba  has  the  form  ps^  F(p,n-p;a),  where 
s^  is  an  estimate  of  <j2,  and  C  has  the  form  P'P  where  P  is  the  nxp 
matrix  of  partial  derivatives 

a 

f(x(*,0),  t"l , . .  ,n,  j*l,...,p« 

"j 

See  Bates  and  Watts  (1981)  and  Hamilton,  Watts,  and  Bates  (1982)  for 
other  possible  formulations  of  ba  and  C,  and  for  possible  reparameteri- 
zations  to  improve  the  coverage  probability  of  the  100(l-a)%  confidence 
region. 

A  further  assumption  imposed  upon  f  is  that  for  each  xcX,  and 
each  0e0a,  f  is  a  continuously  differentiable  function  of  0,  with 
Vf ( 0 )  i*  0  on  0a.  This  condition  is  satisfied  quite  often  in  practice. 
In  fact,  most  f(0)  will  be  monotonic  in  each  04  on  for  all  xeX. 
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Approximate  100(l-a)%  simultaneous  confidence  bounds  for  the  set  of 
functions  {f(x,6):  xeX}  can  be  obtained  by  computing 

(2.1)  [min  f(9),  max  f(9)] 

for  each  xeX  (Rao,  1973).  Computation  of  the  pairs  (2.1)  for  various 
x's  in  X  requires  the  repeated  use  of  nonlinear  programming.  However, 
the  optimization  problems  in  (2.1)  have  two  important  characteristics 
which  will  be  exploited  in  deriving  the  algorithm  used  to  construct  the 
confidence  band  for  f(9).  Firstly,  9a  is  a  closed,  quadratic  region. 
This  property  is  not  only  useful  in  deriving  the  iterative  form  of  the 
algorithm,  but  it  also  proves  useful  in  establishing  global  and  local 
convergence  properties.  Secondly,  the  assumption  that  f(9)  is  continu¬ 
ously  differentiable  with  Vf(9)  #0  on  6a  for  each  xeX,  implies  that 

(2.1)  can  be  simplified  to 

(2.2)  [min  f(9),  max  f(9)] 

0b  0b 

a  a 

where  0b  is  the  boundary  of  0a# 

3.  THE  ALGORITHM.  In  this  section  the  algorithmic  maps  for  seeking  the 
solution  to  (2.2)  are  derived.  If  f(9)  was  linear  in  9  then  (2.2) 
could  be  obtained  in  closed  form  by  using  the  method  of  Lagrange  multi¬ 
pliers.  The  algorithms  to  be  presented  attempt  to  solve  (2.2)  by 
obtaining  successive  linear  approximations  to  f(9).  In  theory,  stepsize 
modifications  may  be  necessary  to  obtain  convergence. 

To  begin,  approximate  f ( 9)  by 

(3.1)  f(90)  +  Vf ( 90) '(9-9q) , 

where  9o  is  the  starting  value.  The  optimal  values  to  minimize  and 
maximize  (3.1)  over  0b  can  be  obtained  by  solving 

(3.2)  [min  Vf ( 9q) * 0 ,  max  Vf ( 9q) * 0 ) , 

0b  0b 

o  u 

for  a  fixed  9q.  Finding  the  9-values  to  solve  (3.2)  by  Lagrange 
multipliers  involves  solving 

(3.3)  Vf (90)  •  XVg(9), 

where  X  is  the  Lagrange  multiplier  and  g(9)  ■  (9-9)'C(9-9)  -  ba. 
Using  (3.3)  to  solve  for  (9-9)  yields 

(3.4)  (9-9)  -  1/2  X’1  C-lVf(90). 
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Substituting  the  right  hand  side  of  (3.4)  into  the  equation  g(0)  ■  0 
and  solving  for  X  yields 

1/2 

(3.5)  x  -  ±  1/2  <b-J  vf(e0)’  c-iVf(e0H  • 

Since  g ( 8 )  iB  convex,  all  gradient  vectors,  Vg( 8) ,  for  0  c  0^,  are 
normal  to  the  outside  of  0^,  the  direction  of  steepest  ascent.  It 
follows,  then,  that  the  X-value  associated  with  the  minimum  in  (3.2) 
corresponds  to  the  minus  sign  in  (3.5),  whereas  the  X-value  associated 
with  the  maximum  corresponds  to  the  plus  sign. 

A  Substituting  the  expression  for  X  in  (3.5)  into  (3.4),  and  adding 
6,  yields  the  following  iterative  expressions, 

A  1/2 

(3.6)  ek+1  -  e  +  ba  c-i  Vf(ek)  {Vf(ek)'c-ivf(ek)}-i/2, 

for  k  *  0,1,...  The  plus  and  minus  signs  correspond  to  the  iterations 
for  computing  the^maximum  and  minimum  respectively.  In  practice,  0q 
can  be  taken  as  0  since  the  first  iteration  will  put  0}  on  0)$. 
The  (gradient-based)  algorithmic  maps  corresponding  to  (3.6)  will  be 
denoted  by  G*(0)  for  the  minimum  and  by  G*(0)  for  the  maximum.  To  refer 
to  G*  and  G*  simultaneously  the  generic  symbol  G  will  be  used. 

In  practice,  this  author  has  never  encountered  a  situation  where  G 
failed  to  converge.  In  theory,  there  can  exist  f(0)'s,  as  described  in 
section  2,  for  which  G*  (G*)  may  produce  an  increased  (decreased) 
value.  However,  it  will  be  shown  in  the  next  section  that  by  modifying  G 
with  stepsize  adjustment,  algorithmic  convergence  can  be  proved.  In  more 
standard  iterative  notation  (3.6)  can  be  written  as 

(3.7)  0k+l  -  0k  +  dk, 

a  1/2 

where  dk  ■  (0  -  0k)  i  baC"^Vf(0k)  {Vf(0k) * C“^Vf ( 8k) } ”1 /2 . 

Here,  dk  is  interpreted  as  a  direction  vector.  It  takes  0k  a 
distance  ||dk||  in  the  direction  induced  by  dk.  To  obtain  a  stepsize 
adjustment,  (3.7)  is  modified  to 

(3.8)  ®k+l  “  ®k  +  sdk, 

where  s  e  (0,1].  The  stepsize  modifications  of  G*,  G*,  and  G  are 
SG*,  SG*,  and  SG  respectively.  It  will  be  shown  that  there  exists  a  0 
e  (0,1]  such  that  for  any  s  e  (0,4),  the  resulting  0k+i  •  SG(0k)  will 
be  an  improved  value  on  0a. 

It  should  be  noted  however  that  SG  may  converge  to  a  suboptlmal, 
fixed  point  of  the  SG-map.  This  is  not  surprising  in  that  no  algorithm 
can  guarantee  convergence  to  a  global,  or  even  local,  optimum  unless 
strong  conditions  are  imposed  upon  the  objective  function.  It  is 


790 


therefore  recommended  that  if  0~  is  not  small,  different  starting 
values  on  6$  should  be  tried  other  than  G(0).  This  can  achieved 
by  replacing  the  initial  gradient-direction  vector,  Vf(0),  by  some 
other  direction  vector,  vq,  at  the  first  iteration.  The  resulting  6, 
value  will  fall  different  parts  of  6}}  depending  on  the  relative 
orientation  of  0jj  and  the  hyperplane  Vq0. 

4.  CONVERGENCE.  In  this  section  a  convergence  theorem  for  SG*  is 
proven.  The  proof  SG*  follows  similarly.  Following  this  a  geometric 
interpretation  of  the  convergence  properties  of  G  is  presented. 

The  convergence  proof  in  this  section  employs  three  important 
results  in  nonlinear  optimization  theory.  We  refer  to  these  three 
results  as:  The  alogrithmic  map  convergence  theorem,  the  descent 
direction  theorem,  and  the  composite  map  theorem.  Proofs  for  these 
theorems  can  be  found  in  Bazaraa  and  Shetty  (1979).  Before  presenting 
these  results,  we  need  to  elaborate  on  the  notion  of  an  algorithmic  map. 
An  algorithmic  map  is  a  point-to-set  map,  A,  that  assigns  to  each  point 
in  the  domain  0,  a  subset  of  points  in  0.  The  map  A  is  closed  at  a 
point  0  e  0  if  0^  -*  0  and  a^  4  a,  for  a^eACO^)  ,  imply  a  e  A(0).  If 
A  is  a  point-to-point  map,  then  continuity  of  A  implies  that  A  is 
closed.  A  is  said  to  be  closed  on  some  0q  C  0  if  it  is  closed  at  each 
point  in  0q . 

ALOGRITHMIC  MAP  CONVERGENCE  THEOREM.  Let  0  be  a  nonempty,  closed  set 
in  ftP,  and  let  Q  C  0  be  some  solution  set.  Let  A:0  -*  0  be  an 
algorithmic  map.  Given  Oq  c  0,  suppose  that  A  generates  the  sequence 

■0  which  is  contained  in  a  compact  subset  of  0.  Suppose  also 
that  there  is  a  continuous  function  f(0)  such  that  f(0)  <  f ( 0)  if  0 
i  Q  and  0  e  A(0).lf  A  is  closed  over  the  complement  of  fi,  then  all 
the  limit  points  of  {$k}tc»0  are  *n  ®  and  f(®)  for  some 

0  e  ft. 

The  solution  set  considered  for  SG*  and  G*  in  this  paper  is  of  the  type 

fi*  ■  {0:  0  e  0a,  1 0— G*(  0 )  H  £  e}  ,  where  e  is  some  small,  positive, 

error  tolerance.  The  solution  set  for  SG*  and  G*  is  $1*  ■  (0:  0  e  0a, 
|0-G*(0)l  £  e).  Both  Q*  and  8*  contain  the  fixed  points,  0*  and  0*,  of 
G*  and  G*  respectively. 

DESCENT  DIRECTION  THEOREM.  If  there  is  a  vector  d  such  that  Vf(0)'d  < 

0  then  there  is  a  0  >  0  such  that  f(0+sd)  <  f(0)  for  all  s  e  (0,0). 

COMPOSITE  MAP  THEOREM.  Suppose  M}  is  a  point-to-point  map  continuous 
at  6,  and  M2  is  a  point-to-set  map  closed  at  M}(0).  Then  the  compos¬ 
ite  map  M2M1  is  closed  at  0. 

THEOREM  4.1.  The  composite  algorithmic  map  SG*  (SO*)  generates  a 
sequence,  *****  Its  limit  point  in  the  solution  set  0*  (0*) 

eontalning  •#  (I*). 

PROOF.  First  we  show  that  80*  always  generates  points  in  the  compact 
set  9*.  Then  we  show  that  this  sequenoe  of  polntn  on  9*  is  strictly 


decreasing »  Glveft  a  paint  §  fe  0a»  fchte  pbiht-to-point  map,  G*,  gener¬ 
ates  a  point  G*(0)  on  6^.  Since  0a  is  a  convex  set,  any  linear 
stepsize  interpolation  between  0  and  G*(0)  must  lie  in  0O.  Hence, 
the  composite  map  SG*,  always  generates  points  in  0a. 

Next,  define  the  function, 

hk(e)  -  Vf(ek)'(e-ek). 

Note  that  6k+i  *  G*(0k)  minimizes  hk(0)  on  0a.  Since  Vf(0)#O  on 
0a ,  hk(0k+i)  <  hk(0)  for  all  0  e  0a ,  not  equal  to  0k+j .  Since  all 

0k's  are  in  6a ,  hk(0k+j)  <  hk(0k)  ■  0.  But  hk(0kfi)  <  0  implies 

Vf(0)'dk  <  0 

since  dk  *  etc+l  ~  ®k  by  0.7).  Thus  by  the  Descent  Direction  Theorem 

for  each  k,  there  is  a  i  <  0  such  that 

(4.1)  f(0k  +  sdk)  <  f(0k) 

for  some  s  f  (0,5).  Let  S  denote  a  closed  line  search  map  that 
generates  at  least  one  s  satisfying  (4.1)  at  each  iteration  for  some  8 
e[ri,l].  (S  is  a  point-to-set  map,  where  S  is  the  set  of  s-values  in 
[  n ,  1  ]  satisfying  4.1).  Here  q  is  some  small,  positive  value  less  than 
6.  Any  stepsize  adjustment  based  on  a  linesearch,  or  simple  stepsize 
halving,  forms  a  closed  algorithmic  map  if  the  search  interval  is  closed 
and  f  is  continuous  (Bazaraa  and  Shetty,  1979,  section  8.3).  Since  G* 
is  a  continuous  point-to-point  map,  it  follows  by  the  Composite  Map 
Theorem  that  SG*  is  a  closed  algorithmic  map.  Since  SG*  generates  a 
sequence  of  improving,  points  on  the  compact  set,  0a,  the  Algorithmic 
Map  Convergence  Theorem  implies  that  SG*  converges  as  stated.  The 
proof  for  SG*  follows  similarly. 

Theorem  4.1  shows  that  G  will  converge  if  a  stepsize  adjustment 
is  made.  However,  with  examples  uxed  in  practice  G  has  never  failed  to 
converge  quickly  (3  to  4  iterations).  At  first  this  may  seem  curious 
since  G  is  a  constrained,  steepest  descent  (ascent)  mapping.  The 
steepest  descent  algorithm  is  not  generally  considered  to  have  good 
convergence  properties.  This  consideration  has  arisen  out  of  the 
tendency  for  steepest  descent  procedures  to  zig-zag  back  and  forth  along 
long,  narrow  valleys  (or  ridges)  and  thereby  make  slow  progress  toward 
the  optimal  point.  (See  Bazaraa  and  Shetty,  1979,  Figure  8.14.)  However, 
for  most  nonlinear  regression  models  used  in  practice,  f(0)  does  not 
have  such  valleys  or  ridges  on  0a.  To  see  this  note  that  for  most 
regression  models,  f(0)  is  monotonic  in  each  0j  on  0a.  Without  loss 
of  generality,  we  may  assume  that  f ( 0)  is  increasing  in  each  Oj  if  it 
is  monotonic  in  each  0 j  .  But  if  f ( 0 )  is  increasing  in  each  0j  on 
0q ,  then  Vf(0)  must  lie  in  the  nonegative  orthant  for  all  0  e  0a. 
Hence  the  angles  between  any  two  such  gradient  vectors  cannot  exceed  90°. 
So,  for  most  regression  models,  the  contours  of  f  cannot  form  long 
narrow  valleys  or  ridges. 
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It  is  also  reasonable  to  assume  that  G  will  converge  quickly  for  a 
sufficiently  large  sample  size.  This  is  due  simply  to  the  fact  that  as 
the  sample  size  increases  the  size  of  6a  decreases,  thus  making  the 
overall  linear  approximation  to  f(8)  on  6a  more  accurate.  The 
overall  low  curvature  of  the  contours  of  f  make  it  unlikely  that  a 
fixed  point  of  G*  (G*)  will  not  be  a  global  minimum  (maximum)  of  f(0) 
on  0a,  especially  if  6a  is  relatively  small. 

5.  OTHER  APPLICATIONS.  Besides  the  obvious  extensions  of  SG  to 
weighted  least  squares  regression  and  to  Schef fe ' -type  simultaneous 
confidence  intervals  for  several  f^CO),  there  are  other  applications  to 
confidence  band  estimation.  Outside  of  the  (nonlinear)  mean  response 
setting,  there  are  some  useful  applications  to  simultaneous  probability 
estimation.  Such  situations  often  arise  from  covariate  dependent 
probability  models.  As  discussed  below,  applications  include  multinomial 
logistic  regression  and  various  forms  of  covariate  dependent  reliability 
models.  All  of  these  models  satisfy  the  original  issumptions  on  f(8) 
given  in  section  2. 

Large  sample  confidence  bands  for  the  univariate-response,  logistic 
regression  probability  model  exist  in  closed  form  (Hauck,  1983). 
However,  this  is  not  the  case  for  multinomial  logistic  regression.  The 
probability  model  has  the  form, 

(5.1)  exp(x'Bi)/  £  exp(x'Bg) 

8-0 

for  the  ith  out  of  (m+1)  possible  outcomes,  conditional  on  the  predic¬ 
tor  variable  x.  Here  each  Bj.  is  a  vector,  and  B()"0. 

Heteroskedastic  regression  models  are  often  used  in  econometrics. 
Some  of  these  models  allow  the  standard  deviation  of  the  error  variable 
to  be  a  linear  (Rutemiller  and  Bowers,  1968)  or  log-linear  (Harvey,  1976) 
function  of  predictor  variables.  To  form  a  confidence  band  about  the 
reliability,  Pr(Y  >  y*/x)  (with  y*  fixed),  we  must  work  with  p(x)  - 
l-F{(y*-x'B)/g(x'y)} ,  where  F  is  the  distribution  function  of  the 
standardized  residuals  and  g(*)  is  the  identity  function  or  exp(*)>  (In 
the  former  case,  the  estimated  x^y’s  should,  of  course,  not  be  near 
zero.)  Since  F  is  strictly  increasing  in  applications,  a  confidence 
band  about 

(5.2)  (y*-x'B)/g(x'y) 

can  be  directly  converted  into  a  confidence  band  for  p(x).  The  large 
sample  confidence  band  for  (5.2)  will  not  generally  exist  in  closed  form. 
In  this  case  SG  can  be  used  to  search  for  the  optimal  (B,y)-points  to 
form  the  confidence  bounds  for  (5.2)  and  convert  rhea  to  bounds  for 
p(x). 


For  parametric,  oi  semi-parametric,  proportional  hazard  models 
(Elandt-Johnson  and  Johnson,  1980),  it  is  possible  to  get  a  relatively 
tractable  expression  for  the  conditional  probability, 
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(5.3) 


Pr(max{To,...,Tj)  <  min  {Tj+i , . . . ,Tfc} /x) , 

provided  j  is  small  (Peterson,  1986).  Here  each  is  a  response 
time  with  hazard  function 


hi(t)  -  hjj(t)exp(x'0i) , 

where  hg(t)  is  the  common  baseline  hazard  and  0q  ■  0.  For  ja0,  (5.3) 
has  the  logistic  form  in  (5.1).  As  in  the  previous  two  examples,  SG 
can  be  used  to  search  for  optimal  points  on  the  corresponding  6{j  in 
order  to  compute  a  confidence  band  for  (5.3). 
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ABSTRACT.  A  test  is  derived  for  the  hypothesis  of  residual  randomness 
in  a  curve  fit.  The  test  is  based  on  a  binomial  process  where  the  main 
problem  involved  is  equivalent  to  finding  the  distribution  of  the  number  of 
runs  in  a  sequence  of  coin  tosses.  Although  this  distribution  turns  out  to  be 
quite  simple  in  form  and  easy  to  apply,  its  use  in  testinu  curve  fit  seems  to 
have  been  neglected  in  the  statistical  literature.  We  also  obtain  the  more 
general  distribution  of  the  number  of  runs  in  a  sequence  of  throws  of  a  multi¬ 
faceted  die,  which  is  used  to  test  sequences  for  randomness  per  se. 

I.  INTRODUCTION.  One  question  we  wish  to  answer  in  this  paper  is: 

"Will  it  do  any  good  to  add  more  parameters  to  my  model  in  order  to  get  a 
better  fit  of  the  curve  to  the  data?"  The  answer  to  this  question  is  "yes"  if 
we  can  reject  the  hypothesis  of  residual  randomness,  and  "no"  if  we  cannot. 

We  will  be  concerned  only  with  the  signs  of  the  residuals  and  not  their  magni¬ 
tudes  and  we  will  examine  the  residual  signs  in  a  left  to  right  manner.  For 

example,  the  residual  sign  sequence  ++♦+-♦♦♦ - +  contains  exactly  five  runs 

of  lengths  4,  1,  3,  5,  and  1.  The  number  of  runs  is  equal  to  the  number  of 
changes  in  sign  plus  one. 

If  we  assume  (as  our  hypothesis  of  randomness)  that  tht  probability  of 
obtaining  a  positive  (or  negative)  residual  at  any  point  is  one-half,  the 
analysis  of  sign  runs  is  equivalent  to  the  analysis  of  runs  of  heads  and  tails 
in  coin  flipping. 

Suppose,  for  example,  that  someone  were  to  flip  a  coin  for  us  one  hundred 
times.  Our  hypothesis  is  that  this  is  done  with  a  fair  coin.  Suppose  also 
that  the  first  fifty  flips  come  up  heads  and  the  second  fifty  flips  come  up 
tails.  What  could  we  conclude  from  this?  We  could  either  conclude  (1)  that 
an  extremely  unlikely  event  has  taken  place  (why?),  or  (?)  that  the  flipper 
has  flipped  a  two  headed  coin  fifty  times  and  then  exchanged  it  for  a  two 
tailed  coin  and  flipped  that  fifty  times.  Statistics  is  based  on  the  idea 
that  conclusion  (2)  is  the  shrewder  of  the  two.  In  the  context  of  residual 
signs,  we  would  just  say  that  "the  fit  is  obviously  very  poor."  Saying  this 
another  way,  we  reject  the  possibility  that  an  extremely  unlikely  event  has 
taken  place  in  favor  of  the  possibility  that  our  original  hypothesis  (on  the 
basis  of  which  the  probability  of  the  unlikely  event  was  computed)  is  false, 
i.e.,  we  reject  the  hypothesis. 
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II.  RUN  DISTRIBUTION.  First  we  need  to  answer  the  following  question: 
"What  is  the  probability  of  obtaining  i  runs  in  n  flips  of  a  coin?"  We  will 
call  this  probability  r^.  we  know  from  basic  probability  theory  and  the  bino¬ 
mial  distribution  that  the  probability  of  obtaining  i  heads  in  n  flips  is 
given  by 

bi  =  (")/2n  (0  <  i  <  n) 

The  interesting  thing  is  that  the  probability  of  obtaining  i  runs  in  n  flips 
may  be  obtained  mnemomcally  by  merely  subtracting  one  from  each  variable 
parameter  in  bj: 

ri  *  (1  <  i  <  n) 

One  feels  that  a  result  as  simple  and  practical  as  this  should  be  common 
knowledge.  Is  it?  As  is  often  the  case,  our  result  can  be  easily  guessed  by 
examining  a  special  ca-e.  Take  n  =  5,  and  draw  the  tree  of  all  possible  out¬ 
comes. 


AAAAAAAA 

A  A  A  A  A  A  A  A AAAAAAAA 

+  -  +  -+  -4  -4-4  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  - 

123234  3  234  5  434322343454323432321 


Also,  label  each  leaf  of  the  tree  with  the  number  of  runs  in  the  path 
leading  to  it.  The  probability  weight  of  each  path  is  1/32.  A  leaf  labeled  k 
will  be  called  a  k  leaf.  We  then  simply  count  the  number  of  k  leaves  (1  4  k  4 
5)  to  obtain  the  entire  probability  distribution: 
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(n  ■  5) 
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We  therefore  guess  that: 

ri  -  ("l})/2""1 

In  general . 


Reasoning  more  rigorously  (and  recursively): 


Suppose  we  have  labeled  the  leaves  of  an  n  tree.  We  then  want  to  extend 
the  n  tree  to  an  n+1  tree  by  adding  two  leaves  onto  each  previous  leaf  and 
labeling  them.  If  a  leaf  labeled  k  (k  leaf)  has  two  more  leaves  added  to  It, 
one  leaf  will  also  be  labeled  k  and  the  other  will  be  labeled  k+1.  Note  that 
each  k  leaf  "grows"  a  k  leaf  and  each  k-1  leaf  also  grows  a  k  leaf,  but  no 

other  leaves  grow  k  leaves.  Therefore,  If  there  are  R^  k  leaves  In  the  n 

tree,  there  will  be  r£  ♦  r£_j  k  leaves  In  the  n*l  tree.  We  therefore  have  the 


recursion: 


This  Is  the  same  as  the  recursion  for  the  binomial  coefficients: 

1  k  1  ■  w  *  <k> 


Again,  we  can  easily  guess  the  solution  to  the  recursion  as: 

Bk  - 
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This  checks  in  the  recursion  since 


Rn  +  Rn 
Rk-1  Rk 


2(n ♦  2(n”*) 
*'k-2*  ‘lk-r 


2(0  *  Cl» 


2<kv 


,n+l 


We  also 
We 


know  that  =  2  and  this  checks  also, 
therefore  have: 


ri 


R? 


n-i 

i-r 


1 

in 


=  (l?_j|)/2n“l  for  any  n  and  1  < 


<  n 


III.  THE  TEST.  The  main  idea  behind  the  run  test  in  curve  fitting  is 
that  regardless  of  the  fit,  the  number  of  positive  residuals  will  usually  be 
roughly  the  same  as  the  number  of  negative  residuals,  while  the  number  of  runs 
will  be  smal 1  for  a  bad  fit  and  larger  for  a  good  fit.  Technically  speaking, 
a  large  number  of  runs  violates  randomness  to  the  same  extent  that  a  small 
number  of  runs  does,  but  we  do  not  apply  the  term  "bad  fit"  to  a  case  where 
the  number  of  runs  is  unusually  large.  Our  test  is  therefore  a  one-sided 
test.  To  answer  the  question  of  how  small  small  is,  we  must  arbitrarily 
assign  a  significance  level  to  the  test.  The  significance  level  is  simply  the 
probability  of  the  "unlikely  event"  {that  the  number  of  runs  is  small  1  based 
on  the  hypothesis  of  good  fit.  A  high  significance  level  (say  1/10)  might  be 
used  if  we  want  to  be  fairly  liberal  in  adding  more  parameters  to  obtain  a 
better  fit.  Lower  significance  levels  can  be  used  if  we  wish  to  be  more  con¬ 
servative  about  adding  more  parameters. 

The  criterion  *or  rejecting  the  fit  is  given  by: 

c-j  <  a  *  significance  level 

where  i  is  the  observed  number  of  runs  and 

i 

ci  *  E  rj 
j«l 


is  the  cumulative  probability  that  the  observed  number  of  runs  is  at  most  i. 
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If  we  wanted  to,  we  could  compute  tables  relating  various  values  of  n,  i, 
c-j,  and  a,  but  in  this  day  of  personal  computers  it  hardly  seems  worth  it  - 
especially  since  the  computation  to  test  for  good  fit  is  so  straightforward. 


IV.  ALGORITHM  FOR  TEST.  We  note  in  passing  that 


ri+l/n 


<n;l> 

2n-l" 


2n-l 

(?:i) 

'i-l' 


(n-l)!_ 
i ! (n-l-i ) ! 


(n-l)i 


*  (n-i )/i 

Therefore, 

«"i+i  *  (n-i )r^/i  ,  rj  *  l/2n_  1 

This  recursion  gives  us  an  efficient  means  of  computing  ci( 

Given  n  (the  number  of  residuals),  i  (the  number  of  runs),  and  a  (the 
significance  level),  the  test  algorithm  wfll  end  in  one  of  two  ways  (as  j  is 
Increased): 

1.  cj  >  a  and  j  <  i  with  no  rejection  of  good  fit  hypothesis,  or 

2.  cj  <  a  and  j  *  i  with  rejection  of  good  fit  hypo’h^sis. 

We  may  then  just  write  down  the  test  algorithm: 

input:  n,i,a 

C:  *0 

i 

j:«l 

4 

r:-1.0/2.0**(n-l) 

4 

-*~*c :  «c+r 
t  4 

t  c>a  and  j  <  i  ?^*s  good  ft-*  end 
t  4no 

t  c<o  and  j«i?^Ss  bad  f  U  -  end 
t  4  no 

t  r:»(n-j)*r/j 

T  4 
~j:»jtl 
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V.  MOMENTS  OF  THE  RUN  DISTRIBUTION.  Me  first  obtain  the  moment 
generating  function  of  the  run  random  variable: 


r1  -  C})/*""1 


Therefore, 


M(t)  -  E(e*i)  «  E  rjetJ 
j»l 

.  I  «£>.  tj  .  ”?&>  ..(J-i, 

jM  2"-1  j.0  2"'1 


»t  n-1 


*  2*=I  jEo 


H(t)  «  ---r(l+et)n-l 
20-1 


M(0)  •  1 


M'(t)  *  (l+e*)0"1  ♦  ----  •  (n-l)(l+et)n“2  •  e* 

2*1**!  2n“1 


et 

an-1 


(Uet)n-l(i 


(n-l)et 


Therefore 


*  H(t ) { 


1+ne* 

1+e* 


) 


M'(0) 


1+n 

M(0) (---) 
2 


Ed) 


M  ■ 


n+1 

2 
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1+ne* 

M"(t)  -  M' (t) ( — --)  ♦  H(t) (■ 
1+e* 


ne^d+e*)  -  et(l  +  net) 


) 


l^ne*  Hftjetfn-l) 

M,(t)(— --)  +  ------- 

1+e*  (l+et)* 


n+1  n-l 
M"(0)  «  M' (0)  •  —  ♦  — 

2  4 


n-l 

*  4*  ♦  —  *  E  ( i  * ) 


Therefore, 

n-l 
O*  ■ - 

4 


He  therefore  have  the  run  eean  and  standard  deviation: 


n+1 


/rPl 

a  - - 

2 

VI.  ASYMPTOTIC  RESULTS.  For  large  n,  Me  nay  further  sinplify  our  test  by 
using  approximate  asymptotic  formulas  based  on  the  normal  approximation  to  our 
run  distribution.  If  we  think  of  our  run  random  variable  1  as  being  con¬ 
tinuous  such  that  f (x)  ■  rj  for  1-H<x<1+fc,  and  we  further  approximate 
f  by  the  normal  density  g,  ms  may  write 

1  i  +]{ 

c^  ■  E  rj  *  /  g(x)dx  ■  0(i+J|) 
j-1 


-  ♦( 


a 


where  •  is  the  standard  normal  distribution  function:  * 


#{x)  ■  /*  — -  e-^'dt 
“®  /2n 
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Therefore,  we  have,  approximately 


but 


Therefore 


Equivalently, 


Therefore, 


i+J|-M 

Ci  *  ♦( . > 

a 

n+1  /n-1 

U  r  —  and  a  ■ - 

2  2 

n+1 

-  T 

Cj „ 

/n-1 

2 

2i+l  -  (n+1) 

*  •(-—"! - ) 

/n-1 

2i-n 

(y=) 

/n-1 

2i-n 

/n-1 

i  «  *(n  ♦  (cj )/n-l) 


but 


♦”Mci)  -  -f-Ml-Ci) 

Therefore,  we  have  the  simple  asymptotic  percentile 

i<x  ■  H(n-*_1  (l-a)/n-T) 

where  we  reject  good  fit  on  the  a  significance  level  if  i  <  i0. 
cance  levels  of  0.1.  0.01,  and  0.001  we  have  from  any  table  of 
distribution: 

♦"’(0.9)  >  1.2816  ,  (0.99)  «  2.3263  and  *-’(0.999) 


For  signifi 
the  normal 

>  3.0902 
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i 


Therefore, 


’0.1 


JKn-l.asuriKL) 


io.01  *  %(n-2.3263/rHO 


end 


Iq.OOI  *  H(n-3.0902/n-T) 


VII.  GENERALIZATIONS.  The  previous  formulas  can  be  used  to  test  for 
residual  randomness  in  a  curve  fit  if  we  are  only  concerned  about  the  number 
of  runs  being  too  small.  If,  on  the  other  hand,  we  are  concerned  about  ran¬ 
domness  per  se,  we  must  not  only  be  concerned  about  the  number  of  runs  being 
too  large,  but  we  must  go  even  further  in  our  testing.  Take  the  following 
sequence  as  an  example: 


HHTTHHTT . . .HHTT 

( alternate  pairs  of  heads  and  tails) 

For  a  sequence  such  as  this  of  even  modest  length,  it  becomes  intuitively  evi¬ 
dent  that  the  sequence  is  nonrandom;  yet,  our  basic  run  test  would  declare  it 
perfectly  random  because  the  actual  number  of  runs  is  (almost)  exactly  equal 
to  the  expected  number  of  runs. 

How  do  we  mathematically  reconcile  this  fact  with  our  intuition?  We 
create  a  new  sequence  of  composite  symbols.  In  our  example,  we  have  used  the 
symbol  alphabet  {H,T}.  We  now  step  up  to  the  symbol  alphabet  jHH,  HT,  TH, 

TT}.  We  could  say,  for  instance,  that 

a  >  HH 

b  =  HT 

c  =»  TH 

d  a  TT 


Our  example  sequence  would  then  be: 

adadad. . .ad 

a  sequence  whose  randomness  could  easily  be  rejected  by  our  basic  run  test 
(for  too  many  runs  as  opposed  to  too  few  runs).  We  could  also  take  triples  or 
quadruples  as  our  composite  symbols.  In  the  case  of  quadruples,  our  example 
sequence  would  obviously  consist  of  only  one  symbol  (too  few  runs). 
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It  should  now  be  clear  that  the  next  basic  question  we  want  to  answer  is: 
"What  is  the  probability  of  obtaining  i  runs  in  n  trials  where  each  trial  has 
tn  equally  likely  outcomes?"  Drawing  a  multiway  tree  and  counting  will  not 
suffice  to  answer  this  question,  but  reasoning  recursively  will.  Imagine  an  m 
way  tree  representing  n  trials  with  its  leaves  labeled  with  the  number  of  runs 
in  the  path  leading  to  each  leaf.  We  now  want  to  grow  m  leaves  on  each  leaf 
of  the  n  tree  in  order  to  obtain  the  n+1  tree.  Consider  a  leaf  labeled  i  (i 
leaf)  in  the  n  tree.  Regardless  of  what  symbol  this  leaf  represents,  it  will 
grow  exactly  one  i  leaf  and  it  will  grow  exactly  m-l  i+l  leaves.  By  the  same 
token,  a  leaf  labeled  i-l  in  the  n  tree  will  grow  one  i-1  leaf  and  m-l  i 


leaves.  No  other  leaves  of  the  n  tree  grow  i  leaves.  Since  there  are  r"  i 
leaves  and  ^  i-i  leaves  in  the  n  tree,  the  number  of  i  leaves  in  the  n+1 
tree  is  given  by  the  recursion: 


nn+l  _n  ,  . . _n 

R.  -  R.  +  (m-lJR^j 


It  is  clear  that  we  want 


R"  «  m 

because  there  are  in  sequences  with  only  one  run. 

Examining  the  recursion  for  various  values  of  n  and  i  enables  us  to  once 
again  guess  the  solution: 

R"  -  mtm-l)1"1^}) 

Checking  this  solution  in  the  recursion: 

♦  0"-1)r"_1  «  m(m-l)<"l(Jl}) 


+  (m-l)m(m-l)i’2(^) 

-  *(m-l)i'1((lV})  ♦  (^J)) 

*  mlm-l)1"1^)  «  R1}*1 

Also, 

rJ  ■  m(m-l)°(nQ1)  *  m 

and  we  see  that  our  guessed  solution  is  indeed  correct. 
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Sine*  the  total  number  of  paths  in  the  n  tree  is  an,  we  have: 


n  -  r"/«"  -  (JiJhb-d1-1/*0-1 


■  Prob  (i  runs  in  n  throws  of  an  m  faceted  die) 

As  before,  it  is  a  trivial  matter  to  obtain  the  recursion  for  r^  to  be 
used  in  computing  the  cueulative  distribution  function  c^. 

ri+l  ■  (m-l)(n-i )r^/i 

rj  ■  l/mn-1 

c1+l  -  c<  ♦  r<+1 

C1  ■  rl 

The  eoeant  generating  function  eay  be  obtained  in  a  manner  similar  to  the  m  » 

2  case: 

H(t)  •  E(etl)  .  E  etj('J“J){m-l)j"l/en'1 
j-1  J‘l 

-  E1  et(J+1,(n“1)(»-l)J/»n’1 

j-0  * 

«  V  (n"l)((«-l)et)J  •  l"”1^ 

m"’1  J-0  J 

#t 

-  ---r  (1  ♦  (m-De1)""1 
mn"1 

Using  the  first  and  second  derivatives  of  N  at  t  -  0  gives  us  the  mean  and 
standard  deviation  of  the  number  of  runs. 

n(m-l)  ♦  1 

„  - - 

m 

✓<n-l)<m-l) 
a  - - - — 

m 
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An  asymptotic  normal  approximation  to  is  therefort  given  by: 


Ci 


•  ( 


a 


) 


(m-l)(i-n*Jf)  ♦  i-% 
/(n-l)(m-l) 


Another  question  one  might  ask  is:  "What  is  the  most  probable  number  of 
runs  (the  mode)  in  n  throws  of  an  m  faceted  die?"  Using  the  recursion  formula 
for  r^,  it  is  easily  concluded  that:  if  m  divides  n(m-l)  exactly,  we  have  two 
modes,  i  and  i+1,  where  i  *  n(m-l)/m;  and  if  m  does  not  divide  n(m-l)  exactly, 
we  have  one  mode  i,  where  i  =  the  greatest  integer  less  than  n(m-l)/m+l. 

We  can  use  these  modal  values  of  the  run  distribution  to  define  the  idea 
of  "run  perfect"  superrandom  sequences.  That  is,  if  the  basic  symbol  sequence 
and  all  composite  symbol  sequences  up  to  a  given  composite  symbol  length  f 
each  have  the  modal  number  of  runs,  we  say  that  the  original  sequence  is  run 
perfect  up  to  this  1  value. 

In  [1],  the  authors  display  a  page  of  H's  and  T's  and  ask  whether  this 
sequence  represents  the  flips  of  a  fair  coin.  They  do  not  answer  the 
question,  but  the  answer  is:  very  probably  not,  because  the  probability  of 
obtaining  the  given  number  of  runs  or  fewer  is  less  than  three  chances  in  one 
thousand.  As  a  complement  to  [1],  we  display  the  following  sequence  of  1200 
flips  which  is  demonstrably  more  random  (after  the  fact)  than  most  such 
sequences  produced  by  a  real  coin,  because  the  sequence  is  run  perfect  for 
composite  symbols  of  up  to  length  ten. 


HTTHTHTHTHTTMHHHTHTTTHTTHTTHTHMHHTTHMTMMTHHHHTTTTT 
T  HT THMHT  THTTHHHH I T 1  T  TT  HT  THMT  HM  THHT  T T  THMHf  MMHTT MTTT 
HTHHHT  MHHHHTHHT1  MTHT  HT  HTHT  TMHHTTHHHT  TMHTTTTHHTTHTH 
TTHTmTHHHHTHT  THT  TT  MMTT  HHTHHMTMTTTT  TTHT MTTMHTHHHHTH 
TTTTMHHMHHHHTTWTHHTMTHHHHTHHTHTHHHTTTHTTHTTHTTTHM 

TmmT  TT  ThTmHmmTHT  HTHHTT  tmhtttthhihtht  ttmhthtthtttht 
1  THhHHHT  TTHTTTIHTmTHTTHHTHTTHHHT  TMHHHTTTTT  THHHHTHT 
HHTH1  T  T  HMHHT  HT  TH  THHT  TT  Ml  HTHH  TMTT  T  TTHHTTHT  MMMTTT1  HT 

ttttt-hthttht  mhthhhht  tt  thhhtht  ththtttttht  mhhmmt  hht  h 

rtHHT  TIHHHMTTTTTMMHMTTTTTTHHHMHTMMTTHHTMHHTTMTTTTH 

ththhttthtttmthmwhtthhththhttmmhhtthmtthththtttthh 
HT  HT  HHT  THMHT  TTT<T  THTTTT  TT  TTTMTMTHTHHT  HT  TMMTHHMTTTTT 
MHMHTHHTTHTTTTTHHHHTMTHTMTTTTTHHMMTTHTTTTTTHTMTTTH 
TTHTTHTTTHHHTTTMHMHTTHHMTTHHMHHHHTTTTHTTTTHHHTTHTH 
THHHHTHT  HTHMTHTHTHHHTHMTHTHTTTMTTHMHMTTTHTTTTTTHHT 
HTTTHHTHHHHMHHHTTHTHTTHTTHHMTTTTTTTTMHHTTMHT  tmtttt 
HTHTTHTHMTTTTTHTHHHHHTHTTTHHTHTTHHTMHHTTHTMHTTHTTH 
THHHTHMHHTHMHHTTHTHHTHTTTMHTHTHHT1HTHTHHTTTTHHHHH1 
THTTHTTTHHHTHHHHHHHHHTHHHTTTTHTTTHHTTTHHTHHHHHHHTH 
HTTHTHHTTTTHTTHTHTHTHHHHTHTHHTHMHHTHHHHHk1HHTHTHHHH 
TTTTHHTTMHTHTHHHTHHHTHTMTTHTHHHHTTHTMMTTHTHTTHTHHT 

hthttththhtttthhtthhthmhhtththtmttttthhhhhhhhthhht 

TTTTHTTHTTTHTHTT  THHHHTHT TMHHTTHHTH TTHTHHHHHTTT TT  TT 
HHTHHHHTTHTIMHTTTHTTTT  THHTTHHIHTTHHTTHTTTHTHTHHMHH 
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ON  THE  ESTIMATION  OF  SOME  NETWORK  PARAMETERS  IN 


THE  PERT  MODEL  OF  ACTIVITY  NETWORKS 

Salah  E.  Elmaghraby 
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C33 

The  PERT  model  of  activity  networks  (ANs>*  which  dates  to  the  later  halt 
of  1957*  represents  a  landmark  development  in  the  theory  and  practice  cf  operat¬ 
ions  research  for  several  reasons*  not  the  least  important  of  which  is  that  it 
represented  the  first  attempt  at  the  explicit  recognition  of  randomness  in  the 
duration  of  activities  within  the  context  of  project  planning  and  control.  The 
model's  almost  instantaneous  popularity*  which  was  bolstered  by  requiring  all 
tenders  to  the  DoD  to  be  couched  in  the  vernacular  of  PERT,  invited  theoreticia¬ 
ns  to  take  a  closer  look  at  the  model's  constructs*  and  they  found  them  lacking. 

The  critique  of  the  assumptions  and  derivations  of  the  original  PERT  model 
are  presented  in  Chapter  4  of  Elmaghraby 's  book  There  one  can  also  find 

description  of  some  early  attempts  at  rectification  relative  to  the  estimation 
of  the  expected  duration  of  the  project*  and  to  the  estimation  of  its  probabili¬ 
ty  distribution  function  (pdf). 

Briefly*  the  main  line  of  criticism  to  the  estimation  fo  the  pdf  of  the 
completion  time  of  the  project  runs  as  follows.  The  duration  of  the  project  is 
the  time  of  realization  of  its  last  event.  Now*  assuming  the  nodes  of  the  net¬ 
work  to  be  numbered  sequentially  from  1  to  n  in  the  'acitivity-on-arc’  mode  of 
representation,  the  last  event  is  node  n  and  its  time  of  realization  is 
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denoted  by  r  .  Evidently*  it  is  a  random  variable  (rv)  equal  to  the  maximum  of 

a  finite  number  of  rv’s*  each  representing  the  duration  of  a  path  from  the  start 

node  (node  1)  to  the  terminal  node  (node  n).  These  paths  are  not  independent 
because  they  usually  share  activities.  Even  if  they  were  considered  'approxima¬ 
tely'  independent,  their  durations  are  only  approximately  normally  distributed. 
But  the  time  of  realization  of  node  n  is  definitely  not  normally  distributed. 
In  fact,  under  the  assumption  of  independence,  the  pdf  of  Tn  is  the  product  of 

the  individual  path  pdf's,  which  is  known  to  converge  to  the  step  function  as 

the  number  of  paths  grows  without  bound.  Finally,  even  if  we  are  willing  to 
‘approximate’  the  pdf  of  by  a  normal  distribution,  it  should  be  with  a  diffe¬ 
rent  mean  and  different  variance  from  that  suggested  by  PERT! 

This  talk  is  besed  on  the  paper  with  the  same  title  by  Elmaghraby,  ref. [23, 
which  should  be  consulted  for  a  more  detailed  exposition  of  the  concepts  outlin¬ 
ed  here. 

The  introduction  of  uncertainty  in  the  duration  of  the  activities  has  enri¬ 
ched  the  field  with  new  concepts  which  came  into  being  in  response  to  a  variety 
of  questions.  These  latter  may  be  viewed  as  the  probabilistic  counterparts  to 
their  deterministic  equivalents.  For  instance,  since  (almost)  any  path  may  be 
the  critical  path  (cp),  in  the  sense  of  being  the  longest  path  in  a  realization 
of  the  project,  it  is  meaningless  to  inquire  of  the  cp,  but  it  is  meaningful  to 
inquire  of  the  probability  that  a  particular  path  is  the  cp.  This  gave  rise  to 
the  concept  of  the  ’criticality  index’  of  a  path,  and  subsequently  to  the  conce¬ 
pt  of  the  criticality  index  of  an  activity.  The  issue  of  the  duration  o  the 
project  is  now  re-cast  into  the  determination  of  the  pdf  of  rn  or,  for  that  mat¬ 
ter,  the  pdf  of  r.  for  any  node  j  in  the  .network.  Interestingly  enough,  resear- 
J 

ch  in  the  approximation  of  the  pdf  gave  rise  to  the 
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concept  of  ’network  reducibi  1  ity ’ *  which  is  of  both  theoretical  as  well  as  prac¬ 
tical  significance.  Finally*  the  probabilistic  counterpart  of  the  critical  sub¬ 
network  (i.e.i  the  set  of  cp’s)  in  the  CPM  model  is  the  set  of  minimum  number  of 
paths  whose  criticality  index  is  at  least  a*  0<oKl.  Alternatively*  a  set  of  K 
paths  are  referred  to  as  the  'topmost  K-cp’s’  when  their  criticality  index  is 
the  maximum  among  all  sets  of  K  paths  in  the  network. 

The  approaches  to  the  solution  of  the  problems  posed  adopt  one  of  the  foll¬ 
owing  four  avenues:  the  analytical  determination  of  the  exact  value(s>*  analyti¬ 
cal  approximation*  analytical  bounding*  and  estimation  via  Monte  Carlo  sampling 
(MCS).  The  difference  among  the  various  approaches  is  almost  evident  from  their 
names*  but  a  word  of  clarification  is  still  in  order. 

There  is  no  substitute  for  the  analytical  determination  of  the  exact  val- 
ue(s)  that  is  sought.  However*  it  is  not  always  possible  to  achieve  that  lofty 
objective*  or  it  is  possible  but  not  economical  in  effort.  Then  approximation 
is  admissible.  There  are  two  analytical  approches  to  achieve  such  approximat¬ 
ion*  in  addition  to  the  approach  via  MCS.  The  choice  among  these  options  is  a 
matter  of  taste;  it  is  also  a  matter  of  the  requirements  of  the  analysis.  For, 
sometimes  analytical  approximation  may  give  no  clue  to  the  error  committed.  If 
the  magnitude  of  the  error  must  be  controlled*  then  analytical  bounding  may  be 
the  only  route  open  to  the  analyst.  Finally*  recall  that  MCS  does  not  bound  the 
error*  but  only  the  probability  of  committing  an  error  of  specified  magnitude. 

The  accompanying  diagram  gives  a  global  view  of  the  methodologies  adopted 
in  the  resolution  of  this  problem*  and  represents  a  synopsis  of  the  talk  presen¬ 
ted  at  the  conference. 
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SOLIDIFICATION  AND  MELTING  WITH 
INTERFACIAL  ENERGY  AND  ENTROPY 


Morton  E.  Gurtin 
Department  of  Mathematics 
Carnegie  Mellon  University 
Pittsburgh,  PA  15213 


ABSTRACT .  The  classical  theory  of  Stefan  for  solidification 
an^  melting  is  too  simplistic  to  describe  phenomena  such  as  super¬ 
cooling,  superheating,  and  the  formation  of  dendrites.  Recently, 
a  general  theory  was  developed  for  phenomena  of  this  type;  this 
paper  describes  that  theory,  a  theory  based  on  general  thermodynamical 
laws  which  are  appropriate  to  a  continuum  and  which  include  contribu¬ 
tions  of  energy  and  entropy  for  the  interface  between  phases. 

1.  INTRODUCTION .  A  classical  problem  of  mathematical  physics 
is  the  Stefan  problem  for  the  melting  of  a  solid  or  the  freezing  of 
a  liquid.  The  underlying  theory,  however,  is  too  simplistic  to 
describe  phenomena  such  as  supercooling,  in  which  a  liquid  supports 
temperatures  below  its  freezing  point,  or  superheating,  the  analogous 
phenomenon  for  solids,  or  dendrite  formation,  in  which  simple  shapes, 
such  as  spheres,  evolve  to  complicated  tree-like  structures. The 
past  two  decades  have  seen  the  development  of  more  general  theories2 
for  phenomena  of  this  type,  a  critical  ingredient  being  a  free-boundary 
condition  at  the  solid-liquid  interface  I  =  I(t)  in  which  the 
temperature  depends  on  the  curvature  of  I.  In  these  theories  questions 
arise  as  to  what  are  the  interface  conditions;3  in  fact,  it  is  npt 
clear  which  of  the  interface  conditions  are  constitutive  assumptions 
and  which  follow  directly  from  the  underlying  balance  laws. 

Here  we  shall  discuss  a  recent  paper  of  Gurtin  [1986]  .  That 
paper  develops  a  theoretical  framework  for  theories  of  the  above 
type  starting  from  general  thermodynamical  laws  which  are  appropriate 
to  a  continuum  and  which  include  inter  facial  contributions  for  both 
energy  and  entropy. 

2.  GENERAL  RESULTS.  The  chief  assumptions  -  ipart  from  general 
equations  of  state  for  the  bulk  and  interfacial  quantities  -  are  that 
the  interface  I  produce  no  entropy  and  that  the  temperature  be 
continuous  .across  I.  Among  the  main  results  are  the  interface 
conditions4 


^Cf.  Chalmers  [1964]  and  Delves  [1974]  for  discussions  of  these  phenomena. 
2 

Mullins  [1960],  Mullins  and  Sekerka  [1963,1964],  Voronkov  11965].  See 
also  the  review  articles  by  Sekerka  [1968,1973,1984],  Chernov  [1972], 
Delves  [1974],  and  Langer  [1980]. 

3 

Cf.  Rogers  [1983]  for  a  discussion  of  some  of  the  inconsistencies  in 
the  literature. 

*Cf.  Moeckel  [1975],  Fernandez-Diaz  and  Williams  [1979],  and 
Wollkind  [1979]  for  the  first  relation  in  (1)  . 
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[q]  *m  =  v  [E]  -  vKe  -  on  I , 

T  =  UE]  -  ice )  /  ([  S  ]  -  Ks)  on  I ,  (1) 

vm  •  n  =  0  on  51, 

5 

in  which  T  is  the  temperature;  {E]  ,  [S],  and  [q]  are  the  jumps 
in  energy,  entropy,  and  heat  flux  across  the  interface;  e  and  s 
are  the  interfacial  values  of  energy  and  entropy;  ic,  v,  and  m, 
respectively,  are,  for  the  interface,  the  sum  of  principal  curvatures, 
the  normal  velocity,  and  the  unit  normal  vector  (outward  relative  to 
the  region  occupied  by  phase  1) ;  eA  is  the  time  derivative  of  e 
following  the  interface;  n  is  the  outward  unit  normal  on  the  boundary 
of  the  region  B  occupied  by  the  body. 

The  first  of  (1)  is  essentially  the  first  law  of  thermodynamics 
at  the  interface.  The  second  -  derived  within  the  fully  dynamical 
theory  -  is  a  condition  of  local  equilibrium  expressing  balance  of 
free-energy  across  the  interface.  The  third  is  a  contact  condition 
for  that  portion  of  the  interface  which  intersects  the  boundary  of  B; 
it  asserts  that  -  where  the  interface  meets  dB  -  it  is  orthogonal 
to  dB  or  stationary. 

Two  types  of  boundary  conditions  are  discussed;  an  isolated 
boundary  on  which  q  •  n  =  0;  an  isothermal  boundary  on  which  T  is 
constant.  It  is  shown  that,  for  either  of  these  boundary  conditions, 

interfacial  area  is  uniformly  bounded  in  time,  (2) 

at  least  when  B  is  bounded. 

3.  EQUILIBRIUM  THEORY.  For  isothermal  boundary  conditions 
stable  states  are  defined  as  minimizers  of  a  global  free-energy. 

It  is  assumed  that  the  bulk  free-energies  cross  at  a  single 
temperature  T„;  it  follows  that  -  for  bounded  B  -  stable  states 
are  always  single  phase,  the  stable  phase  being  the  phase  with 
lower  free-energy.  T„  therefore  represents  the  temperature  at 
which  a  change  in  stable  phase  occurs,  and,  for  that  reason,  is 
referred  to  as  the  transition  temperature. 


t; 

Our  convention  for  jumps  and  for  the  latent  heat  L  is  "phase  2 
minus  phase  1",  with  phases  labeled  so  that  L  >  0.  Thus  for  a 
solid-liquid  system  phase  2  would  denote  the  liquid. 

6Here  it  is  important  to  emphasize  that  the  boundary  is  held 
at  constant  temperature;  two-phase  solutions  are  possible 
when,  for  example,  the  body  is  isolated  and  the  total  energy 
constrained. 
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The  question  of  stability  for  unbounded  B  is  far  more  interesting 
There  the  results,  expressed  in  terms  of  a  solid-liquid  system  in 
isothermal  equilibrium,  assert  that:7 

(i)  There  are  no  stable  states  in  which  the  bounded  phase  is  solid 
and  the  unbounded  phase  supercooled  liquid. 

(ii)  Under  the  conditions  of  (i)  ,  minimizing  sequences  of  the 
free-energy  are  consistent  with  interfacial  instabilities  such  as 
the  formation  of  complicated  arrays  of  thin  spikes,  behavior  indica¬ 
tive  of  dendritic  growth . 

4.  QUASI -STATIC  THEORY.  A  quasi-static  model  is  developed  for 
situations  in  which  the  interface  moves  slowly  compared  with  the  time 
scale  for  heat  conduction.  The  chief  constitutive  hypothesis  under¬ 
lying  this  model  is  that  -  in  each  of  the  phases  -  both  the  bulk  energy 
and  the  bulk  entropy  are  constant.  It  is  also  assumed  that  the  conduc¬ 
tivities  k^,  the  interfacial  energy  e,  and  the  interfacial  entropy  s 
are  constant.  Let  B.  (t)  denote  the  subregion  of  B  occupied  by 
phase  i  (i  =  1,2),  and  let 

u  -  T  -  Tm. 

g 

Then  these  assumptions  lead  to  the  system 


Au  =  0, 

q  =  -k^grad  u 

in 

V 

u  =  -hit/  (1-aK)  , 

(q]  •  m  =  (L-*e) v 

on 

I. 

(3) 

vm  •  n  =  0 

on 

dl, 

where 

h  =  TMf(TM)/L,  a  =  Tms/L  (4) 

with  f(*)  the  inter facial  tree-energy  and  L  the  latent  heat. 
Global  growth-conditions  are  established  for  (3)  under  the  two  types 
of  boundary  conditions  discussed  previously.  In  particular,  letting 

p  =  TMf(TQ)/L,  0  =  e/L, 


7 

In  their  paper  of  I19b3],  Mullins  and  Sekerka,  working  with  the 
dynamical  theory  described  by  (13) ,  established  the  instability 
of  the  interface  or  infinitesimal  perturbations  of  a  sphere 
solidifying  in  a  supercooled  melt.  The  assertions  (i)  and  (ii) 
are  analogs,  within  the  equilibrium  theory,  of  the  Mullins-Sekerka 
instability, 
o 

grad,  div,  and  A  are  the  gradient,  divergence,  and  Laplacian  operators; 
for  F  =  F ( t)  and  f  =  f(x,t),  F*  «  dF/dt  and  f'  «=  df/dt;  vol(*)  and 
area(«)  denote  the  volume  and  area  measures. 
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it  is  shown  that: 

(iii)  for  an  isolated  boundary, 

vo1<B2)*  +  Parea (I)’  =  0,  area(I)*  <  0;  (5) 

(iv)  for  an  isothermal  boundary  (u  =  Uq  =  constant  on  dB)  , 

u0vol(B2)*  +  parea(I)*  <  0.  (6) 

The  results  (5)  and  (6)  seem  to  indicate  (asymptotically 
as  t  ->  cjo)  interfacial  instabilities  such  as  those  described  in 
(i)  and  (ii)  ,  as  well  as  an  instability  characterized  by  a  solid 
phase  whose  volume  tends  to  zero,  but  whose  interfacial  area  does 
not.  This  phenomenon  is  referred  to  as  the  formation  of  a  dendrite 
with  null  volume. 


5.  THEORIES  BASED  ON  THE  CAPILLARITY  RELATION.  Thus  far  no 
assumptions  have  been  made  concerning  the  size  of  the  interfacial 
quantities;  even  though  the  hypotheses  underlying  (3)  are  strong, 
the  theory  is  exact  in  the  sense  that  the  underlying  equations 
are  fully  compatible  with  the  first  two  laws  of  thermodynamics. 


Consider  now  the  general  relations  (1) ,  but  in  situations  for 
which  inter facial  energy  and  entropy  are  small.  Then,  to  within 
terms  of  higher  order  in  these  quantities. 


(q]  •  m  «  Lv, 
u  «  -hK, 


(7) 


with  h  as  defined  in  (4) .  The  relations  (7)  are  central  to  the 
modern  work  on  solidification. 


A  model  is  developed  based  on  the  interface  conditions  (7)  in 
conjunction  with  assumptions  of  constant  specific  heats  and  constant 
conductivities.  These  assumptions  lead  to  the  equations 


C^u*  =  -div  q, 

q  =  -k^  gradu 

in 

Bi 

u  =  -hK, 

• 

3 

II 

F 

< 

on 

I# 

(8) 

vm  •  n  =  0 

on 

dl. 

The  assumption 

C1  -  C2  (9) 


is  common  in  the  literature;  granted  (9),  the  following  growth 
conditions  follow  from  (8)  : 


.  the  references  cited  in  Footnote  2. 


820 


(v)  for  an  isolated  boundary, 

{vol(B2)  +  CVumJ*  =  0, 

/  (u“um)2}‘  1  0; 
B 

/  (u~ Uq)  2 } *  <  0. 
B 

Here  V  =  vol(B),  C  =  C^/L,  and  is  the  mean  value 


lh  area (I)  -  u  vol(B5)  +  (C/2) 

111  4 

(vi)  for  an  isothermal  boundary, 

(h  area (I)  -  uQ  vol(B2)  +  (C/2) 


(10) 


(11) 


A  standard  model  for  solidification  follows  from  (8)  when  the 
terms  C^u'  are  neglected: 


Au  =  0 , 

q  =  -k^  grad  u 

in 

Bi' 

u  =  -hK, 

[q]  *m  =  Lv 

on 

I, 

(12) 

vm  •  n  =  0 

on 

dl. 

Here  (10)  and  (11)  are  replaced  by: 

(vii)  for  an  isolated  boundary, 

vol(B^)’  =  0,  area(I)*  <  0;  (13) 

(viii)  for  an  isothermal  boundary, 

uQ  vol(B1)*  +  h  area* I)  ‘  <  0.  (14) 
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NUMERICAL  COMPUTATION  OF  THE  APPROXIMATE  ANALYTICAL 
SOLUTION  OF  A  STEFAN'S  PROBLEM  IN  A  FINITE  DOMAIN 

Shunsuke  Takagi 
U.S.  Army  Corps  of  Engineers 
Cold  Regions  Research  and  Engineering  Laboratory 
Hanover,  NH  03755-1290 


ABSTRACT .  The  approximate  analytical  solution  of  Stefan's  problem  in 
a  finite  domain  with  constant  boundary  and  initial  conditions  was  found  and 
reported  last  year.  This  year  we  report  the  numerical  computation  of  the 
analytical  solution.  We  start  with  the  presentation  of  the  temperature 
solutions,  which  are  easily  verifiable.  The  Interfacial  position  is  deter¬ 
mined  by  solving  a  complicated  nonlinear  equation  composed  of  a  summation 
of  transcendental  functions,  which  we  describe  in  detail. 

I.  THE  ANALYTICAL  SOLUTION.  We  consider  the  simplest  freezing  prob¬ 
lem  in  a  finite  domain  0  x  £  Jt.  The  boundary  temperature  at  x  ■  0 
and  Tg  at  x  ■  t  are  constant,  the  latter  being  also  the  initial  tempera¬ 
ture.  The  freezing  temperature  is  Tp. 

At  t  ■  0,  a  new  phase  emerges  at  x  ■  0,  whose  temperature  we  express 
by  Tj(r.,  Xjt),  where  Kj  is  the  thermal  diffusivity  of  the  new  phase.  The 

domain  of  the  new  phase  is  0  <  x  <  s(t),  where  initially  s(t)  -  sp/t  and 
finally  s(«)  -  constant,  sq  being  a  constant.  We  express  the  temperature 
of  the  old  phase  by  Tn(x,Xut),  where  Kjj  is  the  thermal  diffusivity  of 
the  old  phase.  The  domain  of  the  old  phase  is  s(t)  <  x  <  Z.  The  quanti¬ 
ties  of  the  new  and  old  phases  are  designated  by  the  roman  numerals  I  and 
II,  respectively,  used  as  a  sub-  or  super  index.  The  conditions  to  be 
satisfied  are: 


Tr  (0,  xxt)  -  TA  , 

Tj(s(t),  Xjt)  -  Tir(s(t),  '««*)  ■  ■ 


3Tt 

Ki*r 


s(t) 


3TII 

-hisr 


s(t) 


t  _  ds 
uLplt> 


TII  *11*^  "  TB  * 


and 


Tn  (x,0)  -  TB  for  0  <  x  <  i 


(1) 

(2) 

(3) 

(4) 

(5) 


The  approximate  analytical  solution  we  have  found  [I]  is  as  follows: 
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The  new  and  old  phase  temperatures  are  given  by 


T^x.Kjt)  -  TA  +  (Tp  -  Ta)  Erf  /  Erf 


4Kjt 


»(t) 

AiCjt 


(6) 


for  0  <  x  <  s(t),  and 


TII(x,ICIlt)  "  TB - - “  I  [erfc  '2~*^C  ~  •gfc^n'f2^*"x 


N  n-0 


/5qt 


/I^t 


]  ,  (7) 


for  s(t)  £  x  4,  respectively,  where 

-  l  ferfc-fo* -* -  erfc— ~  ■*  &).  1  . 
n-0  / /4Kj jt 

Substituting  (6)  and  (7)  into  (3),  we  find  the  equation  for  the 
determination  of  s(t), 


VTF  ~  V  1  1erfc(s(t)//4xit)  ^  Kn(TB  ~  Tp)  S  _  Lo»s(t) 
/4k^  Erf  (s(t)//4VJ't)  ^Kn  R,f 

where 

q  .  I  [r'erfc lEiiAfi).  +  . 

n-0  /4<jjt  ^^KIIt 


(9) 


(10) 


It  is  obvious  that  the  temperature  solutions  (6)  and  (7)  satisfy  all 
the  assigned  conditions  but  not  the  differential  equation  of  the  heat  con¬ 
duction,  unless  N-0,  s(t)  -  sq/IT  and  the  second  summand  in  Rjj  is 
negligible.  In  other  words,  the  solution  is  exact  at  the  initial  stage, 
where  the  semi-infinite  domain  solution  is  applicable,  but  is  approximate 
beyond  the  initial  stage.  The  approximate  solution,  however,  approaches 
the  exact  one  as  the  limit  as  t_  and  N  Increase  indefinitely  together,  as 
proved  in  [1].  We  may  remark  that  if  we  adopt  a  hypothetical  procedure 
that  ^  in  s(t)  and  %  are  an  extraordinary  parameter  that  may  be  kept 
constant  during  the  time  differentiation,  the  solution  is  always  exact. 


The  summation  in  (7)v  (8)  and  (10)  follow  a  convention.  Because  the 
magnitudes  of  erfc  x  and  i”  erfcx  are  governed  by  exp(-xz),  we  apply  the 
same  convention  to  all  the  summations.  We  explain  the  motivation  for  the 
convention  here.  It  is  formalized  later. 


824 


Initially  the  second  summand  In  the  Oth  (l.e.  n-0)  bracket  and  the 
dual  summands  in  all  the  subsequent  brackets  are  less  than  some  small 
number  10“®,  where  m  la  a  positive  number.  These  summands  are  In  this 
case  regarded  negligible.  The  temperature  distributions  then  reduce  to 
that  of  a  semi -Inf  inlte  domain.  We  call  10"®  the  threshold  number,  and  m 
the  threshold  power. 

At  an  appropriate  time,  called  the  first  lead  time,  the  second  summand 
in  the  Oth  bracket  exceeds  the  threshold  value.  We  then  add  It,  completing 
the  Oth  bracket.  At  the  second  lead  time,  the  first  summand  in  the  1st 
(l.e.  n-1)  bracket  exceeds  the  threshold  value.  Summands  appear  succes¬ 
sively  in  this  way  and  the  brackets  are  completed  successively.  We  let 
mainly  m  -  10  in  our  numerical  computation. 

To  describe  the  successive  emergence  of  latent  summands,  we  use  a 
sequence  of  lead  times  as  a  parameter.  The  parametric  lead  time  grows  to 
infinity  together  with  N. 

II.  INTERFACIAL  COORDINATES.  Introducing  the  nondimensional  tnter- 
faclal  coordinates  t  and  n  by 

K  -  s(t)/A  (11) 


and 


n  -  (4Knt)1/2/t  (12) 

and  defining 

n)  -  l  [i^erfc  ^-  +  (-l)1""1  •  i'kerfc  (13) 

n-0 

for  integers  k  >  0,  we  rewrite  the  transcendental  equation  (9)  to 


W«,  n)  -  Erf(Be/n)  -  (2a//i’)exp(-( 05  /n)2)  •  (14) 

•  U0(C,  h)/{(bC/n)u0(5,  h)  +  UX(C,  n)}  •  0  , 

where 

3  •  (|CII/|CI)  1/2  05) 

a  -  0(KI/Kn)(TF  -  TA)/(TB  -  Tp)  (16) 

and 

b  -  2Lp*II/(KII(TB  -  Tp) )  .  (17) 
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The  domains  of  £  and  n  in  W(£,  n)  are: 

°  <  £  <  £m 

and 

o  <  n  <  “  , 

where 

5.  -  s(-)/t  , 


(18) 


(19) 


(20) 


at  which  n  becomes  Infinite.  s(“)  Is  found  by  solving 


KI(TF-V  WV 


s(«) 


£  -  s(») 


-  0  , 


(21) 


an  expression  of  the  linearity  of  the  final  temperature  distribution.  The 
derivatives  of  Ufc(5,  n)  are  given  by 


n  •  3Uk/3n  -  -Uk+1  ,  (22) 

"  •  3^/31  ■  k  \  |  0k+2  .  (23) 


The  derivatives  and  are  given  by 


n  ($£/n)2 
2  8  6 


w^(C,  n) 


■  1  U0  +"l)  + 

+  I  (b  U5  +  2  °1  -  V3)/(r-U0  *  ">)2 

and 


(24) 


/x 

2 


(BC/n Y 


n  e  •  w  (£,  n) 


n 

,2  .  b£ 


(1+^- 


-  UnUl  ‘  (l  " 


2H2£2 


)  “JUoU3  +IUlU2l/^  +  «l)2 


(25) 
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The  first  summand  in  the  Nth  bracket  of  Uq(£,  h)  becomes  effective 
when  the  inequality 

erfc((2N+£)/n)  >  10_m  (26) 

is  satisfied,  or,  when 

(2N+£)/n  <  z  ,  (27) 

where  erfc  z  ■  10“m.  We  call  z  the  threshold  root,  whose  values  are 
shown  in  Table  1  for  several  values  of  threshold  powers  m.  The  second 
summand  in  the  Nth  bracket  of  Uq(£,ti)  becomes  effective  when 

(2N+2-£)/n  <  z.  (28) 

Given  £  and  n  that  satisfy  nz  -  £  >  0,  we  find  the  maximum  N  among  the  non- 
negative  integers  that  satisfy  (27T,  and  sum  up  (13)  to  obtain  Uq(£,ti), 

.  U3(C,n).  The  second  summand  in  the  Nth  bracket  in  (13)  is  simply 

added  if  it  does  not  underflow. 

The  graph  of  W(£,  const.)  runs  as  shown  in  Figure  1.  It  cuts  the  £ 
axis  from  below.  Numerical  computation  shows  that  the  graph  is  steadily 
increasing  in  the  domain  0  1.  If  the  tangent  at  a  point  on  the  curve 

in  this  domain  cuts  the  £  axis  to  the  right  of  the  origin,  we  may  apply  the 
Newton  Iteration  to  find  the  root  £. 

The  graph  of  W( const.,  n)  runs  as  shown  in  Figure  2.  As  n  tends  to  “, 
the  graph  approaches  to  the  W  axis  from  below.  To  find  the  root  n  we  first 
discover,  as  shown  in  the  figure,  such  points  P  and  Q  on  the  n  axis  that 
satisfy  W(P)»W(Q)  <  0,  where  Q  needs  to  be  located  to  the  left  of  the  mini¬ 
mum  B.  Let  S  be  the  regula  falsi  [2,  3],  i.e.  the  Intersection  of  the  n 
axis  with  the  straight  line  connecting  points  W(P)  and  W(Q).  Interval  PS 
is  narrower  in  this  figure  than  interval  PQ;  therefore,  we  use  the  former 
to  locate  the  root  R  in  this  case.  If  the  tangent  drawn  at  point  W(S) 
falls  in  the  interval  PS,  we  apply  Newton  iteration  at  S.  Otherwise  we 
subdivide  the  range  PS  by  a  new  regula  falsi,  and  repeat  the  procedure. 

In  the  0th  bracket,  £/n,  or,  equivalently,  sq,  is  constant  prior  to 
the  entrance  of  the  second  summand.  The  constant  zone  satisfies  the  condi¬ 
tion 


(2-£)/n>z,  (29) 

which  defines  a  domain  contiguous  to  the  one  found  by  letting  N  ■  0  in 
(28).  The  end  of  the  constant  coefficient  zone  may  therefore  be  defined 
by  the  solution  of  the  simultaneous  equations  W(£,  n)  -  0  and  £  +  n  z  -  2  ■ 
0.  Table  1  shows  the  end  of  the  constant  coefficient  zone  for  various 
threshold  powers  in  the  case  of  the  freezing  of  watnr,  T^  ■  -5°C,  Tp  -  0°C 
and  £+nz-2-0.  The  material  constants  used  are:  Kj  ■  2.2180  J/(m  8 
C),  KII  "  0.5688  J/(m  s  C),  -  1.15xl0-6  m2/s,  xn  -  1.44xl0“7  m2/s,  L  » 

3.35176x10®  J/kg.  These  values  yield  £„  -  0.795898)952. 
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On  a  renewed  assumption  that  the  domain  Is  seml-lnflnlte  whose  tem¬ 
perature  at  x  *  *  is  5°C,  we  have  computed,  on  the  basis  of  the  threshold 
powers  shown  In  Table  1,  temperatures  at  x  *  l  at  the  times  when  the  inter¬ 
face  reaches  the  end  of  the  constant  coefficient  zone,  which  are  Included 
In  the  table.  We  accept  that  the  temperature  In  a  seml-lnflnlte  domain 
applies  prior  to  a  time  at  which  one  of  the  computed  temperatures  is  deemed 
close  enough  to  5°C. 

Table  2  shows  the  nondlmensional  Interfacial  coordinates  for  m  ■  10. 
The  table  also  shows  the  differences  of  the  temperature  gradients  at  the 
terminals  of  both  phases,  a  quantification  for  demonstrating  the  approach 
to  the  final  steady  temperature  distribution. 

The  minimum  absolute  value  of  W(£,  n)  chosen  in  this  computation  for 
defining  the  root  of  the  equation  (14)  Is  10“  .  If  the  power  is  higher 

than  10,  the  solution  process  described  above  does  not  necessarily  converge 
because  of  the  error  bound  In  the  subroutine  for  evaluating  Eq(x),  which 
we  have  defined  through  the  formula 

-x2 

erfcx  =  e  •  EQ(x)  .  (30) 

The  subroutine  Eq(x)  produces  12  effective  digits  for  any  nonnega¬ 
tive  x.  The  program  uses  Erfx  continued  fraction  [4]  from  x  -  0  to  0.5, 
ERFC  5707  RATIONAL  APPROXIMATION  [5]  from  x  -  0.S  to  8.0,  and  erfcx  con¬ 
tinued  fraction  [4]  from  x  *  8.0  to  «. 

The  computer  programs  will  be  made  available,  which  are  written  In 
Fortran  77  and  run  on  the  PRIME  9750. 
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Threshold 

power 

Threshold 

root 

^end 

lend 

*Wnend 

T  at  x  ■  i  in  a  scml- 
infinite  domain 

3 

2.326753766 

0.2386364804 

0.7570046928 

0.3152377821 

4.529240866 

4 

2.751063906 

0.2056840653 

0.6522261918 

0.3153569542 

4.770161762 

6 

3.458910737 

0.1671161673 

0.5299020333 

0.3153718175 

4.941947352 

8 

4.052237244 

0.1444140153 

0.4579164232 

0.3153719935 

4.984650553 

10 

4.572824967 

0.1290340786 

0.4091488161 

0.3153719955 

4.995826039 

12 

5.042029746 

0.1177331889 

0.3733152929 

0.3153719956 

4.998842966 

15 

5.675846347 

0.1052780845 

0.3338219183 

0.3153719955 

4.999826798 

'able  1.  Bid  of  the  Constant  Coefficient  Zone,  and 
for  Various  Threshold  Powers. 


t  Q\ 

=  0.3153719956,  which  is  identical  to  £/n  in  the  constant  coef.  zone. 
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The  positive  value  marked  by  #  shows  that  this  is  in  the  error  bound. 

Suffixes  A,  S,  and  B  of  the  temperature  gradients  mean  the  cold  side,  interface,  and  warm  side,  respectively. 
X£  is  the  nondimensional  space  coordinate  x/£. 
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Abstract: 

Solid  substrates,  when  exposed  to  undesirable  vapors,  can  experience 
temporary  or  permanent  material  damage  by  the  adsorption  of  these 
vapors .  However .  the  supply  of  thermal  energy  to  the  contaminant  can 
put  them  back  into  the  gaseous  phase .  In  this  analysis  the  effect  of 
heating  as  a  means  of  decontamination  of  substrates  is  examined  in 
detail.  In  particular,  the  use  of  thin  electrically  conducting  films 
imbedded  in  the  substrate  is  considered  as  a  heat  source. 

In  the  current  development  a  one-dimensional  model  for  infinitely 
large  substrates  is  adopted.  The  thermal  transport  mechanisms 
include  heat  generation  in  the  conductive  layer,  unsteady  heat 
conduction  through  the  solid  substrate,  heat  of  desorption  in  the 
adsorbed  layer  and  thermal  convection  in  the  gaseous  exterior.  The 
mass  transport  of  the  contaminant  takes  place  by  diffusion  in  the 
substrate  and  desorption  at  the  surface .  These  coupled  phenomena  are 
mathematically  modeled  by  a  set  of  governing  differential  equations 
and  boundary  conditions .  This  set  of  equations  is  solved  numerically 
by  finite-difference  methods.  The  analysis  predicts  the  temperature 
ana  contaminant  concentration  of  a  solid  undergoing  heating .  Results 
for  some  special  cases  have  been  included  to  ' exhibit  the  typical 
behavior  of  such  systems.  For  most  cases  the  heating  process 
decontaminates  the  solid.  However,  for  some  cases  the  heating 
increases  the  solubility  of  the  contaminant  and  may  increase  its 
concentration  after  long  periods . 
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Nomenclature 


concentration  of  absorbed  vapor  within  the  solid 

maximum  vapor  concentration  that  is  dissolvable  in  the 
solid  (This  is  related  to  the  available  absorption  sites 
per  unit  volume) 

specific  heat 

mass  diffusion  of  vapor  (in  ambient  gas  or  solid) 
heater  depth  from  outside  surface 
dimensionless  adsorption  depth  [equation  (3.28)] 
thermal  conductivity 
thickness  of  the  slab 
Lewis  number  (  *  a/D) 

mole  friction  of  the  contaminant  vapor  molecules 

Nusselt  number 

partial  pressure  of  vapor 

volumetric  heating  rate 

molar  heat  of  adsorption  for  the  first  layer  of  adsorbed  molecules 

molar  heat  of  condensation 

molar  heat  of  solution 

universal  gas  constant 

time 

temperature 
surface  temperature 
ambient  temperature 

dimensionless  partial  pressure  (also  equal  to  mole  fraction) 

coordinats  normal  to  the  slab 
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Greek  Letters 


o  -  thermal  diffusivity 

6  -  thickness  of  heating  element 


A  -  coupling  parameter  governing  the  thermal  transients  of  the 

slab  surface  [Equation  (3.27)] 

c  -  solubility  parameter  which  links  surface  coverage  and  bulk 

concentration  at  the  surface  [c  -  c8/(1  c  6)1. 


4>  -  property  ratio 

6  -  density 

o  -  dimensional  surface  coverage 

oo  -  available  adsorption  sites  per  unit  area 
0  -  dimensionless  surface  coverage  (  »  a/aj 


Subscripts 


0 

1 

2 

GO 

0 

9 

k 

L 

m 

s 

t 

v 

a 


upper  surface 

substrate  material  (plexi-glass) 
heater  material  (Indium-Tin  Oxide) 

Ambient  far-stream  quantity 
mass  diffusivity  ratio 
for  property  of  gas-vapor  mixture 
thermal  condictivity  ratio 
lower  surface 
mass  transfer 

at  the  slab  surface;  property  ratio  between  solids  2  and  1 
thermal 

contaminant  vapor 
thermal  diffusivity  ratio 


835 


INTRODUCTION 


The  problem  of  contamination  of  solids  by  chemical  vapors  is  one  of  important 
consideration  in  many  industrial  applications.  Typically,  industrial  equipment  exposed  to 
undesirable  vapors  will  undergo  contamination  by  the  vapors  being  adsorbed  on  solid 
surfaces.  After  long  periods  of  exposure,  absorption  of  the  vapor  into  the  solid  will  take 
place.  This  can  cause  deterioration  of  materials  such  as  plexiglass  windows  and  may 
result  in  the  entire  unit  being  temporarily  non-functional. 

In  the  present  analysis  we  examine  the  process  of  decontamination  by  heating  the 
plexiglass  substrates  with  imbedded  electrically  conducting  layers.  The  application  of 
such  heating  elements  for  the  purpose  of  deicing  is  well  known.  However,  little  is  known 
about  its  overall  effectiveness  for  the  removal  of  adsorbed  and  absorbed  contaminants. 
While  the  heat  input  supplies  energy  to  the  contaminant  molecules  and  sets  them  into  a 
free  state  (gaseous  state),  it  also  increases  the  solubility  of  the  contaminant  in  the  solid 
substrate.  It  is  therefore  necessary  to  carry  out  a  detailed  mathematical  analysis  to 
determine  the  effect  of  heating  on  adsorbed  and  absorbed  contaminants. 

In  the  current  study  we  make  some  fundamental  assumptions  relating  to  the 
adsorption/desorption  kinetics  and  then  develop  a  one-dimensional  model  for  the  removal 
of  physically  adsorbed  contaminants.  The  mathematical  analysis  provides  information  as 
to  what  data  are  needed  to  predict  the  performance  of  such  a  decontamination  system. 
At  the  same  time  some  typical  cases  have  been  run  to  simulate  such  predictions.  Also 
included  are  cases  for  which  the  increased  solubility  due  to  heating  may  cause  the 
contaminant  level  to  increase. 

Kinetic  Theory  of  Adsorption  and  Desorption 

According  to  Langmuir's  [1]  monolayer  model,  adsorption  takes  place  at  a  constant 
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heat  of  adsorption  Q(.  The  ratio  of  the  occupied  adsorption  sites,  o,  to  the  maximum 
available,  o0,  on  a  unit  area  is  given  by 


o 

0  -  — 

a. 


k3p 


(2.1) 


*0  1+M>  ' 

where  k3  is  a  constant  and  p  is  the  partial  pressure  of  the  vapor.  This  model  for 


adsorption  is  suitable  for  low  pressures.  A  model  for  multilayer  adsorption  was 
suggested  by  Brunauer,  Emmett  &  Teller  [2]  in  1938.  By  simple  accounting  of  the  number 
of  molecules  in  each  layer,  they  gave 


kx 

0  -  -  (2.2) 

(1-x)(1-x  +  kx) 

Here  x  -  p/p0  where  p0(T)  is  the  equilibrium  vapor  pressure  at  temperature  T,  and 

(0  -Q  )/RT 

k  »  e  ■  0  ).  The  heats  of  adsorption  Qa  and  Q0  correspond  to  monolayer  and 

multilayer  states,  respectively.  The  total  heat  of  adsorption  is  given  by 


0,(0)  ■  [0,  *  (0,  -  Qo)x]0ao/N  -  Q0ao/N,  (2.3) 

where  N  is  the  Avogadro  number  and  Q  is  defined  as 

Q  *  a,  -  (Qa  -  Q0>x'  (2.4) 


The  fundamental  principle  behind  the  decontamination  lies  in  behavior  of  0  with 
temperature.  From  (2.2)  it  is  clear  that  for  small  x,  0  is  nearly  proportional  to  x.  Howover 
pc(T)  increases  with  T  and  hence  x  *  p/po  decreases  with  T.  Therefore  0  decreases  with 
T,  indicating  a  reduction  in  the  amount  of  contaminant  adsorbed  in  the  surface  of  the 
solid.  It  is  clear  that  by  raising  the  temperature  a  surface  may  be  decontaminated. 


In  the  next  section  we  formulate  the  governing  differential  equations  for  substrates 
electrically  heated  by  imbedded  conductive  layers. 
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3. DECONTAMINATION  OF  SUBSTRATES  BY  ELECTRICAL  HEATING:  FORMULATION 


Description  of  Problem 

Since  with  a  rise  in  the  temperature  of  the  substrate,  decontamination  of  the 
surface  takes  place,  the  possibility  of  heating  by  imbedded  elements  (such  as  in  an 
automobile  windshield)  is  considered  here.  The  application  to  defogging  and  deicing  is 
well  known  [1,4, 8-9].  We  adopt  here  a  one-dimensional  model  in  which  the  heat  and 
mass  flow  in  the  direction  pareilel  to  the  plane  of  the  substrate  are  considered  negligible. 
At  this  point  we  can  define  a  specific  one-dimensional  time -dependent  problem. 

Let  us  consider  a  long  slab  of  thickness  L  which  is  exposed  to  an  environment 
containing  a  chemical  vapor  and  some  inert  gases.  An  electrically  heated  layer  of 
thickness  6  is  imbedded  in  the  slab  to  a  depth  h  (see  Fig.  3.1).  The  substrate  is  referred 
to  as  phase  1  and  the  heating  element  as  phase  2.  The  chemical  vapor  deposits  itself  on 
the  surface  of  the  substrate  by  adsorption  and  then  diffuses  into  the  bulk  of  the 
substrate. 

The  physico-chemical  processes  involved  are  as  follows: 

Mass  Transfer 

1.  Diffusion  and  convection  of  contaminant  vapor  in  the  environment  takes  place 
due  to  the  wind  pattern,  or  to  the  motion  of  the  substrate.  This  sets  up  a 
velocity  profile  near  the  surface  of  the  substrate.  As  a  result,  convective 
transport  of  the  contaminant  to  or  from  the  surface  takes  place. 

2.  Adsorption/desorption  of  the  vapor  at  the  solid  surfaces. 

3.  Diffusion  of  vapor  into  the  solid  phases  (1  and  2).  Chemical  reactions  within 
the  solid  are  not  being  considered. 
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Heat  Transfer 


1.  Heat  release  by  electrical  heating  within  the  conductive  layer. 

2.  Heat  conduction  in  the  solid  phases  1  and  2. 

3.  Heat  associated  with  adsorption/desorption  at  the  surface. 

4.  Thermal  diffusion  and  convection  in  the  gaseous  environment. 


Assumptions 


1.  Fluid  flow  processes  are  considered  only  for  their  effects  on  heat/mass 
transfer.  Simple  lumped  parameter  models  (i.e.,  heat  and  mass  transfer 
coefficients)  are  to  be  used  for  these  processes. 

2.  The  adsorbate  (solid)  is  infinite  in  length.  A  one-dimensional  formulation  is 
used  for  the  solid  phase. 

3.  A  uniform  volumetric  heat  generation  rate  q""  is  considered  in  the 
conductive  layer  of  thickness  fi.  Any  chemical  reactions  within  the  solid 
phase  and  the  latent  heat  release  thereof  is  not  considered. 

4.  The  concentration  of  the  sorbed  species  within  the  solid  is  linearly  related  to 
the  surface  concentration  of  the  adsorbed  layer  within  certain  limits. 

5.  The  solid  surfaces  are  taken  to  have  reached  an  adsorption  equilibrium  with 
the  surrounding  gas-vapor  mixture.  At  this  equilibrium,  the  “fraction"  of  the 
surface  covered  (8)  depends  on  the  surface  temperature  T,  and  the  partial 
pressure  pvs  of  the  vapor  adjacent  to  the  surface.  Furthermore,  at  equilibrium, 
py|  is  purely  a  function  of  the  surface  temperature  and  the  heat  of  adsorption. 

If  pv|  is  different  from  ptay  (the  far-field  partial  pressure)  then  vapor  transport 
takes  place  in  the  gaseous  phase  as  well. 

6.  For  the  present  model,  only  physical  adsorption  is  treated  for  a  multi- 
molecular  adsorbed  layer.  The  heat  of  adsorption  for  the  first  layer,  Qa, 

is  different  from  the  subsequent  layers,  for  which  it  is  taken  to  be  the 
latent  heat  of  condensation,  Qo.  For  the  first  layer  Qa  actually  varies  with  the 
fraction  covered  but  here  we  take  it  to  be  the  average  value. 

7.  The  dependence  of  the  partial  pressure  with  temperature  is  given  by 


p  ■  xp  «xp  _e 
Hv,s  ro.«  Ko.® 


o 

fT 


(3.1) 


where  x  -  mys  is  the  mole  fraction  of  the  vapor  in  the  air  adjacent  to  the 
surface. 
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Governing  Equations 


At  this  point  the  problem  may  be  precisely  cast  into  a  mathematical  form  as  a 
closed  set  of  partial  differential  equations.  With  the  assumptions  made  in  S3.1,  the 
equations  for  heat  and  mass  balance  are  as  follows: 


Gas  Phase 

Heat  transfer  between  solid  surface  and  air: 
qg  =  hgt(Ts  "  T®) 

Vapor  mass  flux  between  solid  surface  and  air. 

Ig.v  ~  ^gmPg(mv,s  mv,®^ 
where 


hgt  =  Nutk9/L 


h  -  Nu  D  /L 

gm  m  gv 


(3.2) 


(33) 


(34) 

(3.5) 


with 

Nu  *  Nusselt  number 

kg  =  thermal  conductivity  of  gas 

mv  *  mole  fraction  of  the  vapor 

Dgv  =  binary  diffusion  coefficient  between  vapor  and  air. 
p  *  density. 


Equation  (3.2)  and  (3.3)  are  based  on  lumped  parameter  modeling  of  the  convective 
transport.  The  Nusselt  numbers,  Nu(  and  Num  depend  on  the  external  flow  conditions. 
The  general  characteristics  of  these  Nusselt  numbers  are  available  in  most  heat  transfer 
textbooks  (see,  e.g.,  Burmeister  [21). 

Solid -Gas  Interface 


Heat  transfer  at  the  interface: 

qg  -  qa  =  q„  (3.6) 


where 


840 


or 


9, 


y»0 


aT>, 

and  qg  is  the  heat  release  by  adsorption. 


Mass  transfer  at  the  interface: 
i  +  i  *  j 

Jg,v  Ja  *s,v 

where 

ja  *  mass  rate  of  adsorption  (g/cm2-sec) 
jsv  *  mass  flux  from  the  solid 

3c. 

1v  - 

j*-v  "  -°1v~3y^  v”° 

or 


y—L 

and  jg  w  is  given  by  (3.3) 
Adsorption  Equilibrium 


3c 


i,. 


■+D 


1v, 


iv  3y 


(3.7) 


(3.8) 


(3.9) 


From  (2.2) 


kx 


(1-x)(1-x  +kx) 


(3.10) 


where 


«VQ0,/RT 

k  -  e  a  0 


(3.11) 


and  following  (2.4)  it  can  be  seen  that 

Q  -  Q0  "  <Q.  “  QoM1_x)  (3-12) 

By  employing  the  quasi-equilibrium  assumption  we  may  write  the  adsorption  heat  flux  as 
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(3.13) 


and 

°o- 

ja  =  770xMv  =m009  (314> 

where  N  is  the  Avogadro  number,  o0  is  the  maximum  number  of  available  sites  per  unit 
area,  My  is  the  molecular  weight  of  vapor,  and  m  is  the  mass  of  one  molecule  of  the 
vapor. 

Solid  Phase  1 


Heat  transfer: 


1  3Tl  32ti 

- -  m  - 

a1  3t  3y2 

where  a,  is  the  thermal  diffusivity  and  T1  is  the  temperature. 
Mass  transfer: 


(315) 


1  3civ  _  a2clv 
Dl v  3t  3y2 


(3.16) 


where  D1v  is  the  diffusion  coefficient,  and  c1y  is  the  mass  concentration  in  g/cm3. 


We  assume  a  linear  relatioship  between  the  superficial  mass  concentration  that  is 
adsorbed  at  the  surface  and  the  volumetric  concentration  adjacent  to  the  surface.  This  is 
the  basis  of  assumption  4  in  S  3.1.  Thus  at  y  ■  0  and  y  -  -L, 

mofl0  *  4>c1y  (3.17) 

where  $  is  a  constant  with  dimensions  of  length.  This  is  called  the  penetration  depth. 
The  relationship  (3.17)  is  used  for  cases  when 

mo 

O 

<<Co 

4> 

where  co  is  the  maximum  possible  concentration. 
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(3.18) 


Solid  Phase  2 
Heat  Transfer 


I  »T;  »2T2 

a2  3t  3y2  k2 

where  q""  represents  the  volumetric  heat  generation  rate  in  cals/cm3-sec. 
Mass  Transfer 


(3.19) 


1  3c2v  3c2v 

D2v  at  *  av2 


(3.20) 


The  boundary  condition  between  solid  1  and  solid  2  at  y  ■  -h  and  y  *  -(h+6)  are 
T,  -  T2  (3.21) 


ki 


3V 


and 


c 


1 V 


C2v 


(3.22) 


D 


1  V 


3e1v 

3V 


2w 


3c2v 

3V 


The  initial  conditions  are 
Ti  ”  T2  "  T®  •*  t«0 

(3.23) 

C1v  "  c2v  ’  0  at  1  "  0 

The  equations  (3.1-3.23)  form  a  closed  set.  However,  it  is  convenient  to  identify 
dimensionless  groups  and  make  these  equations  non-dimensional. 
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Dimensionless  Grouping 


The  dimensionless  parameter  are  selected  as  follows: 


Nu,— 

*9 

h  L 

m 

(3.24) 

Nu  -  — — 
m  n 

gv 

(3.25) 

mon 

e  = 

<frco 

(3.26) 

mao  Co  R 

A  =  - •  —  •  — 

LCo  P,  CP1 

mo 

O 

(3.27) 

H  =  - —  (dimensionless  adsorption  depth) 

Lc0 

*  Qo 

(3.28) 

°°  RT 

1 1  1  00 

.  Q, 

(3.29) 

Q  *  - 

-  RT„ 

(3.30) 

Poo 

(50  *  —  (ratio  of  saturation  density  at  infinity  to  maximum  solubility  in  solid  1) 

Co 


(3.31) 

Too  =  vapor  mole  fraction  at  infinity 

(3.32) 

q"'L2 

q  *  (dimensionless  volumetric  heating  rate) 

^  » oo 

(3.33) 

mo0 

t  = - (dimensionless  penetration  depth) 

(3.34) 

♦c0 

♦ka  =  Vkl 

(3.35) 

^Dg  =  °gv/D1v 

(3.36) 

Lei  =  a/D1v 

(3.37) 

*as  -  a2/a, 

(3.38) 
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^Os  "  D2v/D1v 

(3.39) 

)k#  -  k2/k, 

(3.40) 

'*  "  T/T* 

(3.41) 

!v  ■  c1v/co 

(3.42) 

*  -  y/L,  h*  -  h/L,  6*  *6/L,  t*  -  c^t/L2 

(3.43) 

By  combining  (3.2),  (3.6),  (3.7)  and  (3.13)  we  obtain 
.  d0  »  .  .  3Ti 

Nu,  <b  (T  -  1)  -A— -(Q  +(Q  -Q  )(1-x)]  -± - at  y  -  0  or  y  »-L.  (3.44) 

t  g  s  dt  0  *  0  3y 

Similarly,  by  combining  (3.1),  (3.3),  (3.8),  (3.9)  and  (3.14)  we  find 

3civ 

± -  at  y  ■  0  or  y  ■  -L  (3.45) 

(3.10)  and  (3.12),  when  non-dimensionalized, 

gives 


X  Q*(1  -  1/TJ 


VJ  I  11/  I  I  i 

Num4'DflPo  r«  °  *  _L*iHS 


The  adsorption  equilibrium  given  by 


e 


kx 

(l-x)(1-x  +kx) 


with 


(3.46) 


(0  -  Q  l/T 

e  *  0  * 


The  remaining  equations  may  ba  written  as 


3T,  3T, 


3t*  3y*2 


3c 


1 V 


_  ft 

3t 


i 

l*i  3v‘2 


(3.47) 


(3.48) 


(3.49) 
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(3.50) 


A  -  * 

3T,  32T- 

‘  .  *  •* 

.  »  ♦a*  .  *2  +  ^ 

3t  3y 2 


3c 


2v  d>  — 
▼  n«  i  _ 


32c 


2  v 


3t”  D#  Lei  3y*2 
with  boundary  conditions 


c1y  ”  e0  at  y«  0,-1 


.  .3Ti 


3T„ 


T1  -  t2'T  *  -  K  rr  « y  *  -hV(h*  -  O 

3y  3y 


and 


*  .  3civ  3c2v  .  »  »  , 

civ  *  C2v'  “4>0. at  V  •  -h  -(h  ) 

and  initial  conditions 


T1  =  T2  *1  at  t*  -  0 


civ  *  V  0  at  t  -  0 


(3.51) 


(3.52) 


(3.53) 


(3.54) 


(3.55) 


We  next  examine  the  magnitudes  of  the  various  dimensionless  parameters  so  as  to 
identify  the  relatively  important  transport  mechanisms. 

2.  IDENTIFICATION  OF  THE  IMPORTANT  TRANSPORT  MECHANISMS 

In  this  section  we  first  state  the  approximate  range  of  physical  properties  and 
external  conditions  for  typical  situations  of  practical  interest.  This  is  followed  by  an 

I 

estimate  of  the  magnitudes  of  the  dimensionless  groups. 
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Physical  Proparties 


Plexiglass  substrate  (Poly  Methyl  Methacrylate) 


Property 

Value 
at  378K 

Range 

Density, 

Pi 

1.19 

1.0 

g/cc 

Specific  heat 

cp1 

0.37 

.5 

cal./g-K 

Thermal  conductivity, 

ki 

5x10"4 

1-10x10"4 

cal/cm-sec-K 

Thermal  conductivity, 

ai 

1.14x10"3 

1-IOxlO"3 

cm2/sec 

Mass  diffusivity  (02), 

°1v 

1x10"® 

10"6-10“® 

cm2/sec 

Air  at  20°C 

Thermal  conductivity: 

ke 

0.6267x1 0"4 

cal/cm-sec-K 

Thermal  diffusivity: 

°e 

0.2216 

cm2/sec 

Mass  Diffusivity  (02,N2) 

De 

0.2 

cm2/sec 

Electrically  Conducting  Laver  (Indium  Oxide  +  Stannic  Oxide) 

Density: 

P2 

6.3 

g/cc 

Specific  heat: 

CP2 

0.2 

cal/g-K 

Thermal  conductivity: 

*2 

0.0136 

cal/cm-sec-K 

Thermal  diffusivity: 

a2 

1.08  x  10"2  . 

cm2sec 

Mass  diffusivity:  D2  Not  Available  (consider:  D2y/Dlv  ■  10"1,  1  and  10) 
Thickness  of  layer:  5  typically  1000  103cm,  (consider:  1  -  10  x  10'3cm) 
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Heats  of  Adsorption 


The  ambient  temperature  is  taken  to  be  *  20°C  (293  K).  Assuming  Q(  *  3 
kcal/g-mole,  we  obtain: 

Q,  “Qa/RT*  -  5.2. 

A 

With  this  as  an  approximate  estimate,  a  range  Q(  -  1-25  is  considered. 

A 

Similarly,  with  Qq  *  1  kcal/g-mole,  Q0  *  1-5  is  considered. 


Heating  Levels 


The  maximum  heating  levels  quoted  in  the  literature  are  of  the  order  of  4 
cal/cm2-sec.  Therefore,  the  volumetric  heating  rating  (for  5  *  10-3cm)  is 


k  T 
*2  1  oo 

We  consider  q* 


«  60. 


-  1-100. 


Solubility  Parameter 

The  solubility  parameter  e  is  estimated  as  follows: 

A 

At  the  surface  we  have  c1y  *  e9.  As  9  takes  on  large  values  (say,  10)  the  solid 

A 

phase  will  also  approach  saturation  (c1y  -*■  1).  Therefore,  we  take  e  *  1/10  -  0.1. 

Adsorption  Degth 

The  dimensionless  adsorption  depth,  H  -  moQ/lco  is  estimated  by  examining  the 
solubility  of  N2  in  Poly  Ethyl  Methaclylate.  At  25°C  the  solubility  is 
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s  *  7.5  x  10~2c.c.gasSTP/c.c.substrate. 


where  s  is  defined  by  c1v  ■  sp.  Assuming  a  partial  pressure  of  p  ■  0.1  atm.  we  find  the 
volumetric  concentration  i1y  to  be 

c1v  ■  7.5  x  10‘3c.c.(STP)/c.c.. 

In  units  of  mass  concentration  this  is 

clv  •  9  x  10“6g/c.c.  «*  10'5g/c.c 

We  therefore  take  cQ  *  10~5  g/c.c..  If  the  gas  is  more  soluble,  cQ  may  be  as  large  as  10~3 
g/c.c..  For  very  low  solubilities  it  may  be  10~7  or  10'®  g/c.c.. 

Dimensionless  Groups 

The  maximum  number  of  available  sites  is  of  the  order  of  o  -  1014/cm2.  The 
parameter  H  is  therefore  given  by 

mo„ 

H  -  - — -  10'4 

Lro 

where  L  is  taken  to  be  1  cm,  and  m  »  4.67  x  10“23g  for  N2. 


The  dimensionless  parameter  A  Is  given  by 


m°oR  co  R 
A  - - -  FI- 


10' 


LPicpi  Picp1 

From  these  estimates  it  is  clearly  seen  that  surface  transients  (ft  terms)  would  be 
negligibly  small.  Furthermore,  due  to  the  very  small  mass  diffusivity  of  the  solid  (10-®  - 
10~7  cm2/sec)  compared  to  that  in  the  gaseous  phase  (0.1  cm2/sec),  the  mass  transfer 
relationship  betwen  the  surface  and  the  ambient  reduces  to  the  quasisteady  relation  c1v  ■ 
where  6  is  the  surface  coverage  at  the  adsorption  quasi-equiliblum. 

The  above  linear  relationship  can  be  generalized  by  considering  saturation 
phenomena  within  the  solid  and  a  Langmuir  type  absorption  isotherm,  c  ■  €0/1  ♦  c0  can 
be  employed.  The  results  of  the  estimates  of  these  dimensionless  groups  are 


c6 


summarized  in  Table  1. 
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Dimensionless  Parameter 

Range 

Comments 

Nu, 

0(100) 

Included  in  the  model 

Nu 

m 

0(100) 

This  is  large,  but  irrelevant 
since  gas-phase  mass  transfer 
does  not  affect  problem 

A 

10‘9 

Negligible 

* 

Q 

o 

1-5 

Included 

* 

Q 

a 

1-25 

Included 

M 

q 

1-100 

Included 

H 

10‘6  -  10'1 

Irrelevant 

e 

<1 

Strongly  temperature  dependent. 
Consider  values  10'4  -  1 

6* 

0.001 

Consider  0.001,  0.01,  0.1 

h* 

0  <  h*  <  1 

K 

0.05  -  0.5 

4>Og 

105  -  10a 

L0i 

102  -  105 

^ots 

1  -  10 

*k. 

10  -  100 

^DS 

— 

No  data  available,  try  0.1,  1,  10 

Table  1:  Summary  of  estimata  of  magnitudes  of  dimensionless  parameter 
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5. RESULTS  AND  DISCUSSION 


The  non-dimensional  governing  differential  equations  together  with  the 
simplifications  discussed  in  §4  have  been  programmed  for  a  finite-difference  solution. 
The  results  from  the  various  sets  of  data  have  been  plotted  in  Figs.  5.1 -5.5.  Here  we 
discuss  each  case  in  detail. 


In  Fig.  5.1,  the  temperature  profile  at  various  times  is  shown.  The  heater  is  placed 
at  v*  =  -0.25.  The  thermal  parameters  for  this  plot  are  q*  ■  50  0,  6*  *  0.001,  $ks  =  50.0, 
Nu|0  -  200.0  and  Nu|L  =  10  0.  The  plot  shows  the  following  important  features. 

1.  The  region  near  y*  *  0  reaches  a  steady  state  faster  than  the  rest  of  the 
substrate.  This  is  because  of  the  higher  Nusselt  number  and  the  shorter 
distance  from  the  heating  element. 

2.  More  heat  leaves  through  the  surface  y'*0  than  y*  *  -1.  This  is  owing  to 
the  relatively  lower  thermal  resistance  of  the  region  -h*  <  y*  <  0  than 
-1  <  y*  <  -h*. 

3.  A  large  Nusselt  number  causes  the  corresponding  surface  to  be  cooler  and,  as 
a  result,  leads  to  higher  adsorption.  The  heating  is  therefore  wasteful.  If  the 
Nu,  is  controllable,  then  it  should  be  minimized  so  that  very  high  heating 
levels  are  not  needed. 

4.  The  maximum  steady  temperature  in  this  case  is  T  /Tw  =  1.5.  Assuming 
Tw  *  300K.  we  have  Tmax  *  450K.  At  such  high  temperatures  the  plexiglass 
would  deteriorate. 


In  Fig.  5.2  the  dimensionless  surface  coverage  of  the  contaminant  on  the  outside, 
(0Q)  and  on  the  inside  (0L)  are  shown  as  functions  of  time.  The  parameters  for  the  graph 
are: 


H 

6* 

♦to 

•  * 
q 

Nu.o 

NutL 

♦a. 

Run  1 

0.5 

0.0001 

10 

0.05 

50 

200 

10 

1 

Run  II 

0.25 

0.001 

50 

0.25 

50 

200 

10 

5 

Run  III 

0.5 

0.001 

50 

0.25 

50 

10 

10 

5 
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Run  IV  0.5  0.001  50  0.25  10  2  2  5 

In  addition  we  use  Q*  -  5.0,  Q*  «  1.0,  pvo  -  0.5  and  pyL  ■  0.01..  The  following  features  are 
observed. 

1.  Curves  I  and  II,  between  which  there  is  no  systematic  change  in  the  thermal 
parameters,  both  reach  the  same  steady  state  value  for  60  and  0L.  This  is 
because  the  groups 

1  +  NutoV* 

Q  s  - —  - 

1  1  +  Nu»l  4>k80-h‘) 

and 

^ _ q  6  4>ks 

02  ’  VN“»  ♦  NU,LQ,I 
have  approximately  the  sam< 

2.  The  curve  II  corresponds  to  a  larger  thermal  diffusivity  than  curve  I.  It 
therefore  approaches  a  steady  state  faster.  For  curve  IV  the  Nusselt  number 
is  lower  that  curve  III.  The  eigenvalues  determining  the  rate  of  thermal 
transport  are  smaller  for  curve  IV  and  the  transport  process  lasts  longer  in 
this  case. 

3.  The  higher  level  of  adsorption  on  the  outside  is  due  to  larger  partial  pressure 
of  the  vapor  on  the  outside. 


(5.1) 


(5.2) 

values  for  these  curves. 


In  Fig.  5.3,  the  effects  of  partial  pressure  of  the  vapor  and  the  heats  of  adsorption 
on  the  fraction  covered  are  shown.  The  plots  correspond  to  fixed  values  of  the 
parameters  Q1  and  Qj  (Q,  =  0.0317;  Q2  =  11.5556)  or  fixed  values  of  the  parameters 

T0  -  1  ♦  Q2  (5.3) 

and 

Tu  -  1  +  0,02-  (5.4) 

1.  We  find  that  0L  <  0o  because  TL  >  T  .  This  is  due  to  the  convective  cooling 
which  occurs  in  the  outside. 
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2.  Each  curve  shows  a  steep  linear  portion  for  small  pressures,  a  flat  portion  for 
moderate  pressures,  and  saturation  (9  -*■  «)  as  the  partial  pressure  approaches 
saturation. 


3.  At  large  Q  ,  the  linear  and  the  flat  portions  are  separated  around  6-1.  this 

* 

implies  that  for  large  Qa  (such  as  in  chemisorption)  a  monolayer  is  formed 

*  * 

first  until  8  -  1.  Subsequently,  more  layers  build  up.  For  Qa  Q0,  all  layers 
may  be  formed  simultaneously. 


4.  At  large  Q  ,  the  saturation  phenomenon  is  delayed.  This  happens  because  the 
0  *  *  * 

(Q  -  Q  )/T 

parameter  k  -  e  •  0  *,  which  signifies  the  ratio  of  adsorption  times 

between  the  first  and  the  subsequent  layers,  decreases. 


In  Fig.  5.4,  the  concentration  profile  within  the  solid  is  plotted.  The  solubility  has 
been  assumed  to  be  the  same  in  both  the  substrate  and  the  heating  element.  Also  the 
ratio  of  the  two  diffusivities  is  taken  to  be  unity. 


The  parameters  are:  L*  *  0.5,  5*  -  0.001,  ♦at  ■  5.0,  $D#  *  1.0,  *  50.0,  Le,  *  500, 

<t>kg  -  0.25,  ♦„g  -  5.0  x  105.  Q*  -  5.0,  Q*  ■  1.0,  q*  -  50.0,  Nut0  -  10.0,  NutL  -  200.0,  e  -  0.1, 
Pvo  "  0  5  and  qvL  *  °  01- 


We  observe  from  the  plot  that: 

1.  The  time  taken  to  reach  mass  transfer  steady  state  is  approximately  equal  to 
Le1  x  (time  for  thermal  steady  state). 

2.  For  short  times,  diffusion  occurs  from  both  ends  and  the  effects  from  each 
end  grow  independently  until  they  interact. 

3.  As  time  increases,  the  concentration  profile  becomes  monotonic  and  a 
straight  line  is  obtained  for  constant  mass  diffusivity.  At  steady  state,  a 
steady  stream  of  vapor  diffuses  from  the  higher  concentration  side  to  the 
lower  concentration  side. 
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4.  The  surface  concentrations  are  given  by 


c0  -  ce0/(1  +  e0o)  at  y*  -0  (5.5) 

and 

cL  -  €0l/(1  +  e0L)  at  v*  -1  (5.6) 

Here  0O  and  3L,are  functions  of  the  thermal  parameters  Q1  and  Qj.  the  heats 
of  adsorption  Qa  and  Q0,  and  the  partial  pressures  pyo  and  pyL.  Thus  after  the 
thermal  steady  state  is  reached,  c0  and  cL  remain  constant  due  to  thermal 
equilibrium. 


In  Fig.  5.5,  we  have  plotted  the  variation  in  the  steady  state  bulk  concentration 
(cbulk  ■  /!1c*dy‘)  as  well  as  Tm>x  -  as  a  function  of  q*.  The  important  feature 

incorporated  here  is  that  the  solubility  parameter  c  is  taken  to  be  temperature  dependent. 
It  is  given  by 


e  -  e  e 

O 


Qjd  -1/Tf) 


(5.7) 


where  Qs  -  Q/RT*  is  the  heat  of  solution.  Since  the  solubility  changes  with  changing 
temperature,  the  heat  of  solution  plays  a  role.  The  program  was  modified  to  include  this 
addition  parameter.  The  plot  corresponds  to  the  following  values  of  the  parameter: 


L*  -  0.8, 

5*  -  0.001, 

♦k>  •  50.0, 

♦k,  -  0-25. 

Nuto  -  200.0, 

NutL  -  10.0, 

e0  ■  0.1, 

a)  -  io.o, 

* 

ft 

ft 

* 

Q0  -  1.0. 

Pv0  "  0.5, 

PvL  -  0.01, 

Q,  -  0.0- 1.0. 

The  results  exhibit  the  following  features: 

a 

1.  For  very  low  heat  of  solution,  Q ,  the  bulk  concentration  decreases 
monotonically  with  the  heating  level  q  .  It  may  be  noted  that  the  duration  of 
heating  does  not  determine  the  level  of  contamination  after  the  attainment  of 
thermal  and  mass  transfer  steady  states.  For  this  reason  the  level  of  initial 
contamination  or  the  initial  temperature  do  not  affect  cbulk. 

2.  For  a  reasonably  large  heat  of  solution,  while  the  surface  coverage  (0O  and  0L) 
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decreases  with  increasing  temperature,  the  dissolved  contaminant  in  the  bulk 
increases,  this  usually  happens  when  the  heating  levels  are  low  and  the 
change  of  60  and  0L  are  not  as  rapid  as  that  of  the  solubility.  If  the  outside 
environment  is  very  cold  and  if  the  convective  cooling  is  strong,  the  interior 
of  the  substrate  may  be  very  hot,  but  the  surface  will  remain  fairly  cool.  As  a 
result,  the  heating  may  not  substantially  remove  the  surface  contaminant  and 
at  the  same  time  it  will  increase  the  solubility.  This  will  lead  to  increased 
contamination  if  heating  is  sustained  for  long  periods. 

3.  An  important  consideration  for  design  would  be  the  maximum  temperature 
reached: 


T0  +TLhV(1-h‘)  ♦qVh‘»ta 
T"**x  1  +hV(1-h*) 

If  Tn  *  300K,  then  for  a  material  such  as  plexiglass,  we  would  require 
Tma*  ~  15  t0  avoiding  thermal  damage. 
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One- dimensional  model  of  substrate  with 
Imbedded  heating  element 
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Dimensionless  surface  coverage  at  various  times. 
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Steady  state  average  volumetric  concentration  c^i^  and 
^maximum  temperaturg,  as  function  of  heating  level 

d*.  Note  that  for  Qs  =  0.1-0.5  an  increase  in  heating  level 
q*  from  0  to  about  25  actually  Increases  in  the  contaminant  level 

in  the  steady  state. 
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EFFECTIVE  VISCOSITY 
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Abstract.  The  equations  of  motion  for  the  Poiseuille  flow  of  a 
particle-laden  fluid  mixture  are  solved  for  a  symmetric  flow  in  a 
channel.  We  assume  a  formula  for  mixture  viscosity  as  derived  by 
Graham.  For  mixtures  with  particle  concentrations  below  the 
effective  maximum  packing,  this  results  in  a  flow  with  the 
particles  concentrated  at  the  core  and  a  clear  fluid  layer  at  the 
walls  separated  by  a  transition  layer  where  the  particle 
concentration  varies  between  the  packed  value  in  the  core,  and 
zero  in  the  clear  layer.  This  flow  structure  leads  to  a 
flow-pressure  drop  relation  which  can  be  interpreted  as  an 
"effective  viscosity"  for  the  mixture  in  Poiseuille  flow.  This 
points  out  the  difficulty  in  measuring  the  mixture  viscosity 
using  such  a  flow.  If  the  walls  are  porous,  the  flow  leaking 
through  the  the  walls  leads  to  a  force  opposing  the  formation  of 
the  transition  layer.  If  this  force  is  sufficiently  large,  the 
clear  layer  does  not  form  and  particles  can  foul  the  wall, 
reducing  the  effectiveness  of  this  device  as  a  filter . 

Introduction.  The  two-fluid  model  for  the  flow  of  a  dispersed 
mixture  is  based  on  equations  of  conservation  of  mass  and 
momentum  for  each  material.  These  equations  are  assumed  to 
govern  the  multidimensional  motion  of  such  mixtures,  provided  the 
correct  constitutive  equations  are  supplied.  One  test  of  the 
constitutive  equations  for  a  multidimensional  flow  is  plane 
parallel  flow,  where  the  equations  should  be  able  to  predict  the 
velocity  profiles  and  the  concentration  across  the  channel. 

Eouations  of  Motion.  The  mass  and  momentum  equations  for  the 
flow  of  a  particle-laden  fluid  mixture  are: 


(1)  ~  +  *  *  «  vp  *  0 


(2) 


(3) 


3<1  -  •) 
«t 


+  *■  <  1  -  ■  >  Vf  *  0 


ftvp 

•  'p[at  *  VP  '  *  VP 


■  “V«  *  Pp  ♦  (ppi  *  Pp)  * 
>  •  S  (vf  -  vp) 


+  ■  Pi  c 


vm 


f»Vf 

l*t 


+  Vf  •  *  Vf 


]  f 


avp 

at  4  vf 


vp 
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+  «  Pf  L  (vp  -  Vf)  -»  ("  -*vf)  +  FF  +  »  •  «Ip 
~  "  ■  ’  Ipi  +  *  *  ■  IpT  "  «  *p  0 


+  «  Pf  Cvm 


^  ♦ 
at 


8vf 

at~  +  v* 


+  «  Pf  L  [vf  -  vp)  -»(»-*  vf)  -  Ff 

♦  "  *  (l  -  •)  if  +  *  «  •  ifi 
+  *  -  (l  -  «)  j.fT  -  (l  -  «)  Pf  g 

Here  «  is  the  volume  fraction  of  particles,  pp  is  the  particle 
density,  Pf  is  the  fluid  density,  Vp  is  the  particle  velocity,  vf 
is  the  fluid  velocity,  pp  is  the  particle  pressure,  ppi  is  the 
pressure  at  the  particle  interface,  pf  is  the  fluid  pressure,  pf j 
is  the  fluid  pressure  at  the  interface,  S  is  the  drag 
coefficient,  Cvm  is  the  virtual  mass  coefficient,  L  is  the  lift 
coefficient,  Ff  is  the  Faxen  force,  g  is  the  acceleration  due  to 
gravity,  ip  is  the  particle  shear  stress,  ipj  is  the  particle 
shear  stress  at  the  interface,  £.f  is  the  fluid  shear  stress,  !fj 
is  the  fluid  shear  stress  at  the  interface,  is  the  particle 

turbulent  shear  stress,  and  jJ  is  the  fluid  turbulent  shear 
stress. 

The  virtual  mass  and  lift  force  is  calculated  by  Drew  and 
Lahey  (1986)  by  considering  the  force  on  a  single  sphere 
accelerating  relative  to  an  inviscid  fluid  which  is  undergoing  a 
pure  shear  plus  a  rotation.  This  results  in  Cvm  -  1/2  and  L  = 
1/2.  For  our  present  purposes,  we  shall  assume  that  the  virtual 
mass-lift  combination  is  objective.  This  forces  the  choice  L  = 
*-vm* 


The  Faxen  force  is  comparable  to  the  viscous  forces  in  the 
fluid  phase.  This  term  is  not  usually  included  in  two-phase  flow 
models,  and  consequently  its  form  is  not  common  knowledge.  We 
assume 

(5)  Ff  =  k  «  Uf  »2  vf  . 
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The  pressure  terms  are  very  important  for  wave  propagation  in 
bubbly  flows.  Single  sphere  calculations  such  as  the  ones  used 
to  determine  the  forms  of  the  virtual  mass  and  lift  suggest; 

(6)  pp  =  ppi  =  Pfi  =  pf  -  t  Pi  Iv-f  -  Vpl1. 

Stuhmiller  (1977)  gives  t  =  1/4  for  inviscid  flow. 

The  stress  terms  are  most  difficult  to  model.  If  the 
particles  are  small  and  rigid,  they  are  essentially  stress 
transmitters  on  the  microscale.  Thus,  we  assume  that  the 
particle  stress  and  the  interfacial  stresses  are  the  same. 
Therefore, 

(7)  TP  =  Ipi  -  Tfi. 

The  stress  which  the  particles  are  transmitting  corresponds  to 
the  extra  needed  to  make  the  mixture  more  viscous.  To  account 
for  this  effect,  we  take 

<B>  ip  j  =  P  (a)  r i  . 

For  the  viscous  stress,  we  take  Ishii's  (19/5)  form,  which  is 
derived  from  averaging  the  microscopic  viscous  stress  tensor. 
This  gives: 

(9)  If  =  u  j[*  vf  *  (*  v<)T] 

(vf  -  v,,)  *  o 

The  Reynolds  stresses  are  responsible  for  diffusive  effects 
in  the  momentum  balance.  For  the  Reynolds  stress  in  the  fluid, 
we  take  a  simple  farm  of  a  model  proposed  by  Drew  and  Lahey 
(1979)  which  has  coefficients  which  can  be  calculated  from 
inviscid  flow  around  a  single  sphere  Lamb<1933).  This  gives: 

(10a)  xj  =  a  Pi  a  Ivf  -  vp  i !  I  +  a  p\  b  (vf  -  vp)  (vf  -  vp) 

For  the  Reynolds  stress  in  the  particle  phase,  we  assume  that  the 
particles  follow  closely  the  motions  of  the  fluid.  This  leads 
to: 

(10b)  Tp  =  Pp  c  i v f  -  vpl‘  I  +  pp  d  (vf  -  vp)  (vf  vp) 

Plane  Foiseuille  Flow.  Let  us  specify  the  flow  conditions.  We 
wish  to  examine  the  symmetric  flaw  in  a  channel  of  width  2h.  We 
shall  neglect  gravitational  forces.  Then  the  no  slip  condition 
on  the  fluid  gives  vf  =  0  and  the  condition  of  impenetrability  of 
the  wall  to  particles  gives  n  •  vp  =  0  at  y  =  ±  h.  We  also 
impose  conditions  that  the  total  flow  is  given,  and  the 
concentration  of  the  incoming  fluid  is  known. 


♦ 


(vf  -  vp) 
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(11a)  J_h  i  •  vf  dy  =  2  h  VD 


(lib)  £h  4  ‘  vp  >«»  -  2  »  Uo 

(lie)  f  «  (y)  dy  =  2  h  <«>. 

For  plane  parallel  -flow,  let 
(12a)  vp  =  U ( y )  i 

(12b)  Vf  =  V ( y )  i 


(  1  2t  ) 


-  o  (y) . 


Then  the  continuity  equations  are  satisfied  automatically 
the  momentum  equations  yield 


“3>  0  .  -  .  .  s  (V  -  u)  ♦  .  «  gS  t  ^ 

* .  k  u  ^ 

dy 

(14)  0  =  -  (1  -  «)  ~~  +  ■  S  (U  -  V)  +  U  ^ 


dvl 

dy  J 


.  d  V  d«  .  dV 
•  In  p  u  -j— 

dyJ  dy  dy 


(lfi)  0  -  -  a  ♦  2  a  *  Pf  (V  -  U)  (U  V) 


«y 


+  a  Pf  L  (U-V]g(  a  pp  c  (U  -  V) 


(16)  0  =  -  (1  -  a)  +  {  pf  (V  -  U)  *  4~  +  «  Pf  L  (V  “  U) 


«y 


dy 


+  d7  B  a  (U  ~  V)* 


,  and 


dV 

dy 
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It  is  straightforward  to  show  that 
(17)  Pf  *  "  £7  x  ♦  P<y>  , 

where  Ap  is  the  imposed  pressure  drop  on  the  channel  and  Ax  is 
its  length.  The  interfacial  force  terms  can  be  eliminated  by 
adding  (13)  and  (14).  This  gives 


*R  d  [f.  1  dV  ^  .  dvl 

U8>  -  *7  “  37  [(‘  "  “  57  4  *  •  u  57]  • 

If  P  =  7/2,  eq.  (18)  is  the  Einstein  (1906)  formula  for  effective 
viscosity.  We  shall  use  a  form  given  by  Graham  CB1  as 

(19)  Ueff/U  *  (l  +  1  *)  + 


9 

4 


(h/2.)] 


1 

(h/a) 


1 

[l  +  (h/a)] 


1 _ 

[l  4-  (h/a)]2 


where,  for  a  simple  cubic  packing, 
h/a  -  2  [  (l  -  («/«m)l/s)/(«/«m)l/j]  » 

where  «a  is  the  experimentally  determined  maximum  packing  of 
spheres.  This  form  agrees  with  Einstein's  for  small  •  and  with 
that  derived  by  Frankel  and  Acrivos  (1967)  which  agrees  with  data 
for  larger  concentrations. 


The  relative  motion  between  the  particles  and  the  fluid  can 
be  obtained  from  either  of  the  remaining  momentum  equations  in 
the  x-direction.  If  we  divide  equation  (13)  by  o  S,  we  have 


(20) 


U  -  V 


i  ^R 

S  *x 


u  _d 
5  dy 


0>  ♦  k) 


dV 
dy  ’ 


The  momentum  equations  in  the  y-direction  are  instrumental  in 
determining  the  distribution  of  particles  across  the  channel. 
These  equations  involve  the  transverse  pressure  gradient  dP/dy. 
The  transverse  pressure  gradient  can  be  eliminated  from  (15)  and 
(16)  by  subtracting  «/(l  -  ■)  times  eq  (16)  from  eq  (15).  This 
gives 


(21)  0  =  [c  +  c 


*f 


(U  -  V) 


d  (U  -  V) 

dy 


*  r^r  L  (u  -  v)  57  + 


+  [  -  (t  ♦  «)  J-?7T  +  c]  '*  (U  -  V)” 

eliminate  U  -  V  by  using  equation  (20).  This 


We  can  further 
gives 
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(22) 


F,  +  _  __  «  1  2_u  d*  f.  .  dV  _  «  L  dV 

lC  i-«Js  dy7  ^  dy  1  -  ■  dy 


[  -  (<  ♦  •)  H~t  H  [§  £  - 1 57  O’  ♦  *0g] 


d« 

dy 


Equations  (16)  and  (22)  govern  the  -fluid  velocity  pro-file  and  the 
concentration  of  particles  in  the  channel.  The  appropriate 
boundary  conditions  are 


(23a) 

(23b) 

(23c) 


V  (h )  ■  0 

£h  V(y>  dy  =  2  h  V0 
o  (y)  dy  *  2  h  <«>. 


Let  us  nondimensianalize  the  problem  by 
(24a)  w  *  V(y)/V(0) 


(24b) 


£ ,  =  y/h  . 


The  equations  become 


(25) 


(26) 


(«♦(•-  o  •}  £ 


-  R  t 


[*  ♦ «  -  rM  « r  (•  ♦  •<)  £ 


[  -  (t  ♦  *)  j-r 


+  C 


dw 

4.  •  n 

a 

_  dw 

d& 

▼  *  U 

1  + 

a  dt 

-*■ 

•o£] 

da 

dc 

where 


(27a) 


Ap  h‘ 


U  AX  v<0) 

is  the  channel  Reynolds  number,  and 


S  h 2 

(27b)  D  =  — - 

u 

is  the  dimensionless  drag  per  unit  velocity. 
The  boundary  conditions  are 
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(28*)  m(0)  -  1 

dw 

(28b)  3-  (O)  *  O 


(28c)  m ( 1 )  =  0  . 


/ 1 

(28d)  J0  •  (£)  d£  =  <«>. 

ftporoKimite  Solution.  Let  us  seek  a  solution  -for  D  large.  The 
outer  solution  assumes  c,  =0(1).  With  D  large,  we  must  have  0 

large,  or  dw/dc  small,  or  ■  small.  From  eq.  (25),  we  see  that  if 

R  =  0(1)  and  dw/dc  is  small,  0  must  be  large.  With  dw/dc  =  D-9 

g ,  and  0  *  DP  b '  ,  we  have  •  =  « m  ♦  D-P  ■  * ,  and  b '  = 

(9/8)  (an/«  ) .  In  order  to  obtain  a  balance  in  eq.  (26),  we  must 
have  q  =  1. 


Let  us  assume  that  the  region  with  •  small  is  near  the  wall, 
and  that  the  region  with  dw/d£  small  is  in  a  region  around  the 
center  of  the  channel,  which  we  shall  call  the  core.  Then  in  the 
core  we  have  «  =  «m.  Furthermore,  the  approximate  particle 
concentration  is  given  by 


(29) 


«m  ,  0  <  C  <  t* 
0  ,  C*  <  t  <  1  , 


where  £*  is  the  location  of  the  edge  o(  the  core, 
fluid  region,  the  fluid  velocity  must  satisfy 


In  the  clear 


(30) 


dw 

dC 


R  C. 


so  that 

(31)  w  = - 1-  R  [c*  -  l] 

At  t*  ,  we  have  w(t*)  «  -  (R/2)  (C**  -  1)  =  1.  Thus 

that  R  =  2/(1  -  C**).  If  1  -  t*  -  0(1),  we  see  that  R 
Since  <«>  =  «mc*  ,  we  have 


(32) 


((*=*}■  -) 

Note  that  R  is  not  small  when  (■>  is  near  • 


m 


we  see 
=  0(1). 


The  Transition  Laver.  The  crux  of  the  argument  is  whether  a 
layer  exists  at  C  =  C*  where  ■  makes  a  transition  from  «m  to  0, 
while  dw/dc  goes  from  0  to  -R  c*. 

We  let  =  C*  +  D“P  C'  .  The  right  hand  side  of  eq  (25) 
becomes  -  R  (,*  to  first  approximation.  Using  this,  we  obtain  a 
balance  in  eq  (26)  for  p  =  1/2,  and  dw/dc'  can  be  eliminated  to 

give 
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<33) 


d*° 
dc  '  2 


♦  f  t  («  ) 


0 


where 

(34a) 


f  4  <«  ) 


f "  («) 

«  *  *>  1  -  .  ♦ 

«=] 

1 

f(«) 

\,  ,  r  ,  «  1 

2  • 

r  c  a  1  -  m\ 

(34b) 


f,  («) 


_ 1 _ 

2  ♦  c  -*  nr] 


_1 _ 1 _ 1 _ 

1  -  *  1  +(P  -  1).  *'<“> 


(34c) 


f  (a  ) 


(>  4  k _ 

1  -»  (p  -  l)  « 


It  is  not  known  whether  the  solutions  to  eq  (33)  exhibit  the 
proper  behavior.  A  numerical  solution  to  equation  (33)  with  a  = 
-  1/5,  c  =  -  1/5,  and  c  =  1/4  is  shown  in  Fig  1. 

When  C,*  =  1  -  o(l),  the  clear-fluid  layer  no  longer  exists, 
and  the  wall  lies  in  the  transition  layer.  In  this  case,  R  need 
not  be  0(1).  The  analysis  in  this  case  is  straight-forward,  and 
is  given  in  Drew  (1986).  Figure  2  is  a  plot  of  R  versus  <•>. 
The  equations  for  the  Foiseuille  flow  of  a  particle-fluid  mixture 
in  a  channel  results  in  a  flow  with  a  strong  structure.  The 
structure  which  occurs  consists  of  a  core  of  particles  which  are 
sufficiently  "pact ed"  so  that  they  cannot  shear,  surrounded  by  a 
clear -fluid  layer  where  all  the  shear  occurs.  The  presence  of 
this  structure  has  several  implications.  First,  if  such  a 
structure  occurs  in  a  flow,  one  must  be  careful  in  interpreting 
measurements  of  fundamental  quantities  such  as  mixture  viscosity. 
Measurements  of  properties  such  as  viscosity  usually  assume 
uniform  conditions.  Clearly,  a  flow  with  such  a  structure  is  not 
uniform.  The  quantity  measured  may  be  strongly  dependent  on  the 
structure.  Second,  this  flow  is  not  useful  for  measuring  any  of 
the  terms  in  the  equations  of  motion  that  are  responsible  for  the 
separation  of  particles  and  fluid,  because  the  flow  is  so 
degenerate  that  none  of  these  terms  are  really  acting  during 
this  motion. 

Finally,  we  note  that  the  presence  of  such  a  structure  in 
this  flow  may  allow  the  efficient  filtration  of  such  a  mixture 
using  a  flow- through  membrane  device.  The  situation  is 
essentially  plane  Poiseuille  flow,  with  a  small  amount  of  fluid 
drawn  through  the  walls,  which  are  assumed  to  be  porous.  Since 
the  flow  pushes  particles  away  from  the  walls,  if  fluid  can  be 
drawn  through  the  walls  slowly  enough  that  the  transition  layer 
is  not  disrupted,  then  no  particles  will  stick  to  the  wall  to 
impede  the  further  flow  of  fluid. 
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SOME  REMARKS  ON  BLOW-UP  IN  THE  STEFAN  MODEL  FOR 
PHASE  TRANSITIONS  AND  THE  HELE-SHAW  PROBLEM 


S.D.  Howison 
Oxford  University 


Abstract 

The  classical  Stefan  model  for  melting  or  solidification 
and  the  closely  related  Hel  e-Shaw  model  can  in  certain 
circumstances  exhibit  irregular  or  ill-posed  behaviour. 
Examples  of  this  behaviour  are  given,  and  the  effectiveness  ox 
several  smoothing  modifications  of  the  models  is  discussed. 

1.1  Introduction 


The  simplest  model1  for  the  solidification  or  melting  of  a 
pure  substance  is  the  classical  2-phase  Stefan  model  which  in 
suitable  dimensionless  variables  is  described  by  the  equations 


3u 

at 


vu 

—  =  V2U 


in  the  phase  regions  S(t) 


(1.1) 

(solid)  and  L(t)  (liquid),  with 


and 


u  =  um 


(1.2) 

(1.3) 


on  the  phase-change  boundary  P(t>  separating  S  from  L.  Here 
u(g,t)  represents  the  material  temperature,  um  is  the  fixed 
melting  temperature,  X  is  the  dimensionless  latent  heat  and  a/3n 
is  the  derivative  normal  to  T  from  5  to  L  at  a  point  whose  speed 
in  that  direction  is  Vn;  it  is  assumed  that  the  material 
properties  of  S  and  L  are  the  same.  The  formulation  is  completed 
by  appropriate  initial  conditions  uQ(g)  and  boundary  conditions 
on  the  edge  of  the  region  in  consideration  or  at  infinity. 

We  note  here  two  special  cases  of  this  general  problem. 
Firstly,  if  the  diffusion  coefficient  in  the  solid  is  negligible 
we  obtain  a  1-phase  problem  with  u  -  um  in  S(t);  here2  u  may  more 
realistically  be  thought  of  as  the  concentration  of  a  dissolved 
substance  diffusing  through  L(t>  with  solidification  or  melting 
on  T  at  an  equilibrium  concentration  un).  Secondly,  if  in 
addition  the  diffusion  is  fast  compared  to  the  timescale  imposed 
by  X,  we  may  ignore  the  9/Qt  term  in  (1.1)  and  replace  (1.1)  by 
Laplace's  equation.  This  is  known  as  the  Hel  e-Shaw  problem  (in 
two  dimensions)  and  is  also  equivalent  to  flow  of  a  viscous 
liquid  through  a  porous  medium1;  u  here  represents  the  pressure 
in  the  liquid. 

With  the  additional  assumption  that  u  >  um  in  the  liquid  and 
u  <  um  in  the  solid,  the  basic  Stefan  model  is  known  to  be  well- 
posed  at  least  for  small  times Nonetheless,  there  are  some 
circumstances  in  which  related  problems  exhibit  irregular 
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(ill -posed)  behaviour,  and  three  of  the  more  important  of  these 
are: 

(1)  In  the  presence  of  superheating  or  supercooling, 

(2)  When  there  is  volumetric  heating, 

(3)  When  impurities  are  present, 

and  we  examine  these  in  turn.  The  one-phase  Stefan  problem  is 
corresponding!  y  i  1 1 -posed  l -f  the  phase  in  which  u  is  not 
constant  is  superheated  or  supercooled,  while  the  Hel e-Shaw 
problem  is  well -posed  if  the  fluid  region  is  expanding, 
and  ill-posed  if  it  is  contracting. 

1.2  Supercooling  and  superheating 

A  liquid  is  supercooled  if  its  temperature  is  less  than  um$ 
we  deal  only  with  supercooling  here  since  it  is  more  common  and 
since  the  corresponding  results  for  superheating  can  be  obtained 
by  reversing  the  sign  of  u  -  um. 

The  solution  to  the  Stefan  problem  with  supercooling  can 
blow  up  in  finite  time  in  two  ways. 

(a)  Sherman  blow-up5 

Under  certain  conditions  it  is  possible  to  show  that  the 
whole  phase  boundary  T  may  move  with  infinite  speed  at  a  finite 
time  t*  <  ®,  ard  that  there  is  no  solution  to  the  problem  for 
t  >  t*.  This  form  of  blow-up  is  known  to  occur  in  one-dimen¬ 
sional  and  radially  svmmetnc  geometries,  and  its  cause  is  that 
the  total  energy  stored  in  the  system  in  the  form  of  latent 
heat  is  insufficient  to  raise  the  supercooled  liquid  to  its 
melting  point.  Consider  for  example  a  finite  solid  region  in 
which  u  =  um  immersed  in  liquid  whose  temperature  at  infinity  is 
Li®:  blow-up  in  finite  time  can  be  shown  to  occur  whenever 
u®  “  ufn  <  -  x  (undercoolings  of  up  to  -2X  can  be  achieved  with 

certain  materialsh’)  ,  To  show  this,  let  Q(t)  -  J”  (u-u®) 

which  is  positive  for  suitable  initial  data  uQ(x)  i  u®;  on  the 

other  hand  (using  <1.1)-(1.3>,  dQ^dt  =  (the  rate  at  which  the 
area  of  S(t)  increases).  (ua  -um  +  X)  so  that  if  the  solution 
exists  for  all  t  and  u®  -um  <-X,  Q  -*-®.  This  contr adi cti on 
shows  that  blow-up  must  occur  at  a  finite  time  t#  <  ®.  If  in 
addition  the  problem  has  planar,  cylindrical  or  spherical 
symmetry  we  see  that  the  velocity  of  T  must  become  infinite  at 
t  =  t*.  This  kind  of  blow-up  is  not,  however,  possible  for  the 
Hele-Shaw  problem  since  it  relies  on  the  9/3t  term  in  (1.1). 

(b)  Cuspidal  blow-up 

The  Sherman  blow-up  with  Vn  -»  ®  on  all  T  has  hitherto  only 
been  shown  to  occur  in  symmetric  geometries.  Nevertheless,  the 
argument  leading  to  blow-up  given  above  does  not  depend  upon 
symmetry  (until  the  last  line  which  argues  only  that  Vn  -*  •) . 

The  form  taken  by  blow-up  when  T  is  not  symmetric  is  thought 
to  be  via  a  cusp  in  T  with  infinite  Vn  at  its  tip;  this  is  known 
to  be  the  case  for  the  Hele-Shaw  problem.*  Although  this  may 
seem  like  a  pointwise  version  of  Sherman  blow-up,  it  in  fact 
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occurs  even  if  the  conditions  leading  to  Sherman  blow-up  do  not 
hold,  that  is  even  if  there  is  no  deficiency  of  latent  heat  in 
the  system.  Indeed,  it  is  likely  that  the  set  of  initial  value 
problems  for  supercooled  liquids  or  receding  Hele-Shaw  flows 
which  do  not  blowup  in  this  way  is  small.4*10  Complex  variable 
methods  can  be  used  to  investigate  blow-up  in  Hele-Shaw  flows  in 
some  detail;10  typically  a  3/2-power  cusp  forms  in  the  moving 
boundary  and  the  solution  ceases  to  exist  at  that  time,  although 
in  certain  special  circumstances  other  kinds  of  cusp  can  appear 
momentarily  without  this  non-existence.  All,  however,  have 
infinite  fluid  velocities  at  their  tips,  and  will  thus  be 
prevented  in  practice  by  surface  tension  and  inertial  effects. 

Even  if  blow-up  does  not  occur,  the  moving  boundary  is 
unstable  to  small  perturbations  which  for  large  wavenumber  n 
grow  as  e,nl*;  consequently  the  morphology  of  the  moving 
boundary  may  be  complicated.  A  possible  situation  here  both  for 
Stefan  and  Hele-Shaw  is  an  array  of  parallel  'fingers'  as  shown 
in  Fig.  1;  we  will  return  to  this  point  later. 


Fig.  1 

1.3  Volumetric  Heating 

If  we  impose  a  volumetric  heating  Q,  so  that  fcu/Dt  =  V2u  +  Q 
in  each  phase,  the  liquid  particle  whose  temperature  first 
reaches  um  must  either  become  superheated  or  remain  at  that 
temperature  for  a  time  A/Q  until  it  has  acquired  enough  energy 
to  change  phase  11»1S.  In  the  first  of  these  cases  the 
classical  Stefan  model  has  the  difficulties  described  above;  the 
second  is  not  even  possible  within  the  classical  framework.  This 
situation  has  no  direct  analogy  with  any  Hele-Shaw  type  flow 

D  u 

since  it  is  inherently  a  two-phase  problem  and  the  term  —  in 

ot 

(1-1)  is  essential  to  the  argument  just  given;  nevertheless 
there  is  a  remote  similarity  with  the  squeeze  film  problem 
described  in  ref.  6.,  but  we  do  not  pursue  this  point  here. 
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1.4  Impurities 


The  simplest  model  for  solidification  of  a  dilute  binary 
alloy  consists  of  equations  (1.1),  (1.3)  for  the  heat  flow, 
together  with  diffusion  of  the  impurity  in  solid  and  liquid 
phases.  The  diffusion  problems  are  coupled  through  the 
conditions  on  the  phase  boundary,  and  (1.2)  is  replaced  by  a 
relationship  between  the  melting  temperature  on  the  interface 
and  the  concentrat i on  there.  We  do  not  go  into  the  details 
save  to  remark  that  in  most  practical  situations  the  model 
predicts  'constitutional  supercooling,'  and  that  in  the  absence  of 

surface  tension  the  interface  may  be  linearly  unstable13  in  the 
same  way  as  a  supercooled  liquid,  with  a  linear  growth  rate 
approx imately  e,nl^.  We  therefore  conjecture  that  both  cuspidal 
and  Sherman  blow-up  are  possible  although  this  remains  unproven. 
Again  there  is  no  analogy  with  a  Hele-Shaw  type  flow. 

2.  Regul ari  sations  of  the  models 

The  two  kinds  of  blow-up  described  above  are  physically 
unrealistic.  The  simple  model  (1.1)— (1.3)  (or  its  amended 
versions)  is  plainly  inadequate,  and  we  seek  to  modify  it  in  such 
a  way  as  to  incorporate  hitherto  neglected  physical  effects  and/ 
or  to  render  it  mathematically  better  behaved.  We  describe 
three  such  modifications,  two  based  on  extra  physics  and  one 
more  mathematically  motivated. 

2.1  Surface  tension 

Surface  energy  effects  may  be  incorporated  into  (1.2)  via 
the  Gi bbs-Thompson  condition 

u  =  um(l— 7k)  on  T,  (2.1) 

where  k  is  the  appropriately  signed  curvature  of  P  and  7  is  a 
dimensionless  surface  tension.  This  has  a  dramatic  effect  on 
the  linear  stability  of  T  in  that  only  a  finite  band  of  larger 
wavelengths  is  now  unstable,  and  it  almost  certainly  prevents 
cuspidal  blow-up  both  for  Stefan  and  Hele-Shaw,  although  this 
has  not  yet  been  rigorously  shown  for  either  problem.  On  the 
other  hand,  examples  can  be  given  where  the  Sherman  blow-up 
is  not  prevented  by  surface  tension.  This  can  be  demonstrated 
in,  for  example,  a  spherical  geometry  using  a  version  of  the 
argument  given  in  section  1.2;  the  physical  interpretation  is 
that  the  energy  stored  in  T  is  not  sufficient  to  materially  alter 
the  energy  imbalance  which  is  the  reason  for  this  form  of 
blow-up.  A  version  of  this  argument  can  be  carried  through  for 
regions  without  symmetry,  and  we  conjecture  that,  for  the 
1-phase  Stefan  problem  at  least,  the  only  way  to  avoid  Sherman 
blow-up  is  for  S(t)  to  split  into  infinitely  many  disjoint 
components,  each  a  sphere  of  radius  R,  where  27/R  *  lu*i.  This 
&eenis  the  only  plausible  equilibrium  configuration;  it  bears 
ccr.e  resemblance  to  the  ripening  process  described  by 
G1 i cksman  1  * . 


876 


2.2  Kinetic  Undercooling 


A  second  approach  i  s  to  modify  (1.2)  by  introducing  a 
kinetic  undercooling  on  r,  so  that  the  melting  temperature  is 
now 

u  =  um  “  jJVn?  <2.2) 

this  represents  the  fact  that  the  interface  departs  slightly 
from  thermodynamic  equilibrium.  It  is  not,  however,  used  for 
Hele-Shaw  flows  as  its  physical  basis  there  has  not  been 
established. 


With  this  condition  on  P  the  Stefan  model  avoids  both  Sherman 
and  cuspidal  blow-up1';  it  works  because  the  kinetic  term  -Vn/u 
allows  a  greater  energy  transfer  across  T  when  Vn  is  large, 
which  is  a  stabilizing  process.  Its  only  practical  limitation 
is  that  u  is  usually  so  large  that  Vn  must  be  about  lOm/sec 
before  Vn/u  is  significant.  Kinetic  undercooling  was  doubtless 
significant  in  the  experiments  of  Glicksman**2  with  ua  -um  <  -  x. 
Both  (2.1)  and  (2.2)  are  special  cases  of  the  phase  field  model 
of  Caginalp11. 


2.3  Weak  Solutions 


A  modification  which  is  more  purely  mathematical  in  its 
approach  is  the  idea  of  a  weak  solution;  it  works  particularly 
well  for  Stefan  problems  involving  volumetric  heating.  We 
rewrite  (1. 1)— (1.3)  (with  heating)  in  the  form 


ah 

at 


a 


V2u  +  Q 


(2.3) 


where  h  is  the  enthalpy,  defined  by  h  =  u  +  xH(u-um> ,  H  being 
the  Heaviside  function.  Equation  (2.3)  is  to  be  interpreted  in 
the  sense  of  distributions,  and  this  can  lead  to  solutions  which 
are  not  consistent  with  the  classical  formulation  (1-  1)  —  (1.3) . 
Thus  for  instance  the  solid  particle  whose  temperature  first 
reaches  um  remains  at  that  temperature  while  its  enthalpy 
increases  continuously  from  um  to  um  +  X.  Neighbouring 
particles  also  have  this  behaviour,  and  the  result  is  a 
'mushy  region'  in  which  u  =  um  but  h  varies.  This 
formulation  is  well  suited  to  numerical  solutions  since  no 
special  treatment  is  necessary  to  follow  the  free  boundaries 
(solid-mush  and  mush-liquid,  or  solid-liquid).  Its  physical 
interpretation  can,  on  the  other  hand,  be  a  difficulty,  one 
approach12  being  to  regard  the  mush  as  a  mixture  of  liquid  and 
superheated  regions,  each  of  the  latter  being  small  enough  to  be 
stabilized  by  surface  tension.  The  enthalpy  method  does  not , 
however,  explicitly  incorporate  the  effects  of  the  surface 
energy  stored  in  T  into  the  definition  of  h,  and  a  more  realistic 
definition  would  take  account  of  the  variation  of  this  energy  as 
the  solid  volume  fraction  changes;  this  might  involve  a  model  of 
a  ripening  process  similar  to  that  described  by  Glicksman1*. 

Finally  we  note  that  attempts  to  find  a  weak  formulation  for 
the  binary  alloy  problem  (a  worthwile  goal  in  view  of  its 
potential  numerical  effectiveness)  have  not  hitherto  succeeded. 
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3.  Conclusion 


The  basic  Ste-fan  model  <l.l)-(l.3>  is  mathematically  well 
understood,  but  it  is  physically  unrealistic  in  that  it  redicts 
•finite  time  blow-up  o-f  two  kinds.  The  cor respondi ng  Hele-Shaw 
model  suffers  -from  only  one  of  these. 

Surface  tension  is  probably  an  effective  regularisation  in 
all  but  the  extreme  situation  of  Sherman  blow-up.  Nevertheless, 
there  is  little  rigorous  mathematics  on  this  version  of  the 
problem,  and  some  subtle  and  interesting  questions  remain  to  be 
answered.  Among  these  are  to  explain  the  mechanism  by  which 
cusps  are  prevented  and  the  question  of  the  selection  of  the 
width  of  dendrites  in  an  array  such  as  that  of  fig.  1. 

Kinetic  undercooling  is  also  an  effective  regul  ar  i  sat  i  on  for 
Stefan  problems  but  only  comes  into  play  at  high  interface 
speeds . 

Probably  the  safest  condition  for  Stefan  problems  to  take  is 
the  combination  of  surface  tension  and  kinetic  undercooling 

u  =  um(l-7k)  -  Vn/u, 

and  this  condition  is  discussed  by  Caginalp15.  For  Hele-Shaw 
problems  the  term  Vn/u  should  be  ignored. 

Weal  solutions  work  well  for  volumetric  heating  but  their 
extension  to  include  surface  energy  and  impurity  effects  has  yet 
to  be  accomplished. 
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GLOBAL  OPTIMIZATION  USING  AUTOMATIC  DIFFERENTIATION 

AND  INTERVAL  ITERATION 


L.  B.  Rail 
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Abstract.  Algorithms  are  presented  which  find  one  or  all  of  the  critical  points  of  a 
smooth  function  in  a  rectangular  region,  or  the  critical  points  at  which  the  function  has 
a  maximum  or  minimum  value.  If  no  critical  points  of  the  function  exist  in  the  given 
region,  then  the  algorithm  verifies  this  fact.  The  computation  is  self-validating,  in  that 
the  existence  or  nonexistence  of  critical  points  is  established  conclusively,  and  guaranteed 
upper  and  lower  bounds  are  computed  for  all  quantities  of  interest,  including  the  values 
of  the  gradient  vector  and  Hessian  matrix  of  the  function.  The  algorithms  make  use  of  an 
existing  implementation  of  automatic  differentiation  and  interval  computation.  Numerical 
results  are  given. 

AMS  (MOS)  Subject  Classifications:  65K10,  65G10,  68Q40 

Key  Words:  Global  unconstrained  optimization,  Critical  points,  Automatic  differentiation, 
Interval  iteration,  Self-validating  computation 

1.  Preliminaries.  This  paper  presents  an  algorithm  for  global,  unconstrained  optimiza¬ 
tion  of  a  smooth  (at  least  twice  differentiable)  function  /  :  R”  — ►  R,  that  is, 

(1.1)  /(x)  = /(x  i,x2,...,xn). 

As  is  well-understood,  this  also  includes  the  case  of  optimization  of  a  function  <t> :  Rm  — >  R 
subject  ton-m  smooth  constraints 

(1.2)  9i{xux2t...,xm)  =0,  *  =  1,2,...  ,n-m, 

by  formation  of  the  function 

n-m 

( 1.3)  /(x)  =  <£(xi,...,xm)  +  xm+,  •  <7»(xi, . . . , xm), 

»=i 

where  the  new  variables  xm+i, . . . ,  xn  are  simply  the  Lagrange  multipliers  for  the  problem. 
No  special  properties  of  /,  such  as  convexity,  are  assumed. 

Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041. 
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The  method  to  be  used  is  a  critical  point  method,  which  will  find  one  or  all  solutions 
of  the  system  of  equations 


(1.4) 


V/(x)  =  0 


in  a  rectangular  region  X  C  R",  where  V/( x)  denotes  the  gradient  vector 


(1.5) 


V/(x)  = 


(df{x)  df(x) 

V dn  '  di2  ’ 


dlWV 

dxn  )  ’ 


or  just  the  critical  points  at  which  the  value  of  /  is  a  maximum  or  minimum  in  X.  Such 
points  will  be  called  critical  extremal  points  to  distinguish  them,  if  necessary,  from  non- 
critical  points  on  the  boundary  dX  of  X  at  which  /  might  attain  a  maximum  or  minimum 
value. 

The  algorithm  will  make  use  of  automatic  differentiation  [l  1]  to  compute  the  gradient 
vector  V f(x)  of  /  at  x  =  (xi,X2, . .  .  ,xn),  and  also  its  Hessian  matrix 


(1.6) 


This  technique  will  be  combined  with  the  use  of  interval  arithmetic  and  interval  evaluation 
of  library  functions  [8]  in  order  to  compute  guaranteed  bounds  for  values  of  functions  and 
their  derivatives  over  the  region  of  interest.  The  result  will  be  an  automatic,  self-validating 
optimization  algorithm. 

Automatic  differentiation  has  been  used,  at  least  in  a  restricted  form,  by  McCormick 
[6]  for  optimization  problems.  Interval  methods  have  been  applied  by  Hansen  [2],  [3]  and 
Hansen  and  Sengupta  [4]  to  global  optimization  problems,  including  constrained  problems. 
Although  the  basic  algorithm  given  below  is  for  unconstrained  problems,  the  ideas  pre¬ 
sented  by  Hansen  indicate  the  possibility  of  introducing  constraints  into  the  calculations. 

2.  Automatic  Differentiation.  The  basic  idea  behind  automatic  differentiation  is  to 
use  the  formula  or  subroutine  for  the  evaluation  of  the  function  /  at  x  to  obtain  also  values 
of  its  derivatives  at  the  same  point.  This  is  done  by  the  introduction  of  a  new  represen¬ 
tation  of  variables,  and  arithmetic  operations  which  include  the  rules  for  differentiation. 
The  resulting  computational  scheme  is  simple  to  program  for  computers  [11],  [13],  and 
avoids  both  the  complexity  of  symbolic  differentiation  and  the  inaccuracy  of  numerical 
differentiation.  The  new  variables  are  triples 


(2.1)  U  =  (u,u',u'f), 

where  u  6  R  is  a  real  number,  u'  €  R”  is  an  n-dimensional  real  (column)  vector,  and 
u"  is  a  symmetric  real  n  x  n  matrix.  The  set  of  these  elements  will  be  denoted  by  Hn, 
and  each  u  €  Hn  is  said  to  be  of  type  HESSIAN.  A  variable  U  of  type  HESSIAN  will 
be  interpreted  in  the  following  way:  Its  first  component  u  will  represent  the  value  of  a 
real-valued  function  at  some  point  x  €  Rn,  and  u1  and  u"  the  values  of  its  gradient  vector 
and  Hessian  matrix,  respectively,  at  the  same  point. 
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It  is  obvious  that  Hn  forms  a  linear  space.  More  importantly,  all  the  standard  arith 
metic  operations  can  be  defined  in  Hn: 

(2.2)  U  +  V  =  (u,  u',  u")  +  ( v ,  v\  v")  =  (u  +  v,  u'  +  t>',  u"  -f  t;"), 

(2.3)  U  -  V  =  (u,u\  u")  -  (v,t/,t/')  =  (u  -  v,u'  v\  u"  -  v"), 

U-V  =  (u,u',u")  •  (v,v\v") 

(2.4) 

=  (u  •  V,  u  •  v'  +  v  •  u',  u  •  v"  +  u'v,T  +  v'u'T  +  V  ■  u"), 


(2.5) 


u/v  =  {uyy)i{vyy) 

(U  v  •  u'  -  U  ■  V1  V2  -  u"  -  V  •  ( v'u,T  +  u'v'T)  +  2u  •  v'v,T  —  uv  •  v" 

=  w  v*  ’  y  ~ 

v  #0. 


). 


The  above  definitions  implement  the  rules  for  evaluation  and  differentiation  of  sums, 
differences,  products,  and  quotients  of  functions  with  known  values  and  derivatives.  In 
order  to  use  an  algorithm  for  evaluation  of  a  real  function  to  obtain  the  corresponding 
values  in  Hn,  it  is  necessary  to  be  able  to  represent  the  independent  variables  x,,  t  = 
1,2, ...  ,n  and  constants  c  as  elements  of  Hn.  This  is  done  by  the  mapping 


(2.6)  x,  *  (x,,  c,,0), 

for  the  ith  independent  variable  x,,  where  e,  denotes  the  ith  unit  vector,  and  0  the  n  x  n 
zero  matrix.  (0  will  be  used  to  denote  zero  vectors  and  matrices,  as  well  as  the  real  number 
zero.)  Similary,  constants  c  are  represented  by 

(2.7)  ch(c,0,0). 

It  follows  that  calculation  of  the  value,  gradient  vector,  and  Hessian  matrix  of  a  rational 
function  can  be  done  simply  by  making  the  substitutions  (2.6)  and  (2.7),  and  applying  the 
rules  (2.2)-(2.5).  The  results  are  e  xact,  not  numerical  approximations,  and  are  obtained 
without  symbolics. 

In  actual  practice,  instead  of  using  the  representation  (2.7)  for  constants,  it  is  simpler 
to  define  a  mixed  arithmetic  between  elements  c  €  R  and  U  =  (u,u',  u")  €  Hn  [13]: 

(2.8)  c  +  U  =  U  +  c  =  (c  +  u,u',  u"), 


(2.9) 


c  -  U  =  (c  -  u,  -u',  -u"), 
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(2.10) 

U  -  c  —  (u  -  c,t/ 

(2.11) 

c  •  U  —  U  •  c  =  (c  ■  u,c  • 

u',c-  u"), 

(  c  c  u'  2c-  u'u'T  - 

cu • u" \ 

(2.12) 

c/t/  =  L'  • 

-  )• 

/tt  u'  u"\ 

(2.13) 

U  c  = 

\  c  c  c  / 

c  ?  0. 

For  example,  consider  the  two-dimensional  Rosenbrock  function  ([l],  p.  95): 
(2.14)  f(x)  -  100(x2  -  *i)2  +  (1  ~  xi)2- 


In  order  to  evaluate  this  function  together  with  its  gradient  vector  and  Hessian  matrix  at 
the  point  x  =  (  —  1.2, 1.0),  one  sets 


(2.15)  *«“(->■*•  (i).(S  o))’  l2=(10'O'(o  o))’ 


and  evaluates  (2.14)  using  the  above  rules.  The  result  is 

«•>  as))- 


which  is  exactly  what  one  would  get  by  differentiating  (2.14)  symbolically  and  then  eval¬ 
uating  the  results  for  zi  =  -1.2,  *2  =  1.0  in  real  arithmetic. 

In  addition  to  rational  functions  of  several  variables,  other  standard  functions  can  be 
defined  readily  on  Hn.  For  example, 

(2.17)  sin  U  —  sin(u,u/,  u")  =  (sin  u,cos  u  •  u^cosu  •  u"  -  sinu  •  u'u'  ). 


In  general,  if  g  :  R  — ►  R  is  twice  differentiable,  then  it  can  be  extended  immediately  to 
the  mapping  g  :  Hn  — ►  Hn  by  use  of  the  chain  rule: 

(2.18)  g(U)  =  j((u,n',u"))  =  (g(u),g'(u)  ■u',g'(u)-  u"  +  s"(u)  •u'u'7’), 

[11],  [13], 

It  is  easy  to  program  automatic  differentiation  in  languages  such  as  Ada  and  Pascal- 
SC  13  ,  which  permit  introduction  of  data  types  and  additional  definitions  of  the  standard 
operator  symbols  to  manipulate  such  types.  (This  is  sometimes  called  “overloading”  the 
standard  operator  symbols.)  In  these  languages,  the  variables  x\  and  x-i  in  (2.14)  would 
be  declared  to  be  of  type  HESSIAN,  along  with  the  result  /,  and  the  evaluation  would  be 
carried  out  on  the  basis  of  an  expression  of  the  same  form  as  (2.14).  In  ordinary  Pascal 
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or  FORTRAN,  (2.14)  would  have  to  be  rewritten  as  a  sequence  of  calls  to  subroutines 
for  addition,  exponentiation,  etc.  [11].  The  algorithms  described  in  this  paper  have  been 
programmed  in  Pascal-SC,  and  some  results  are  given  in  the  final  section. 

3.  Interval  Computation.  In  ordinary  optimization  algorithms,  the  function  to  be 
optimized  is  sampled  only  at  a  discrete  set  of  points.  This  can  result  in  the  loss  of  valuable 
information  about  the  function.  The  algorithms  presented  in  this  paper,  on  the  other  hand, 
use  interval  computation,  which  produces  guaranteed  bounds  for  the  values  of  functions 
and  their  derivatives  over  entire  regions  [8].  This  prevents  the  process  from  being  misled 
by  incomplete  information. 

The  basic  component  of  interval  computation  is  interval  arithmetic  [8].  Let  IR  denote 
the  set  of  bounded,  closed  intervals  on  the  real  line  R.  For  I  -  [a,  6]  £  IR,  J  —  [c,d]  £  IR, 
the  arithmetic  operations  are  defined  by 

(3.1)  I  *  J  -  [a,b]  *  [c,d]  =  {x  *  y  \  x  e  I ,  y  e  J)  =  [r,  s], 

where  *  £  and  division  by  an  interval  containing  0  is  excluded.  In  actual 

implementation  on  computers,  directed  rounding  is  used  (downward  for  lower  endpoints, 
upward  for  upper  endpoints),  so  the  actual  result  computed  is  [Vr,  As],  which  always 
contains  the  exact  result  [r,  s]  of  the  interval  operation. 

Evaluation  of  a  real  rational  function  /  :  R  — »  R  in  interval  arithmetic  results  in  an 
interval  inclusion  F  :  IR  *  IR  of  /  [8],  which  has  the  property 

(3.2)  f(X)  =  {/(x)  |  x  £  X}  C  E(X),  X  £  IR. 

Denoting  the  endpoints  of  an  interval  /  =  [a,  6]  by  inf  1  —  a,  sup  7  =  6,  respectively,  (3.2) 
means  that 

(3.3)  inf  F(X)  <  f{x)  <  supF(X),  x  £  X. 

These  bounds  for  the  range  of  f(x )  over  X  are  obtained  automatically,  without  investiga¬ 
tion  of  the  minimum  and  maximum  values  of  /(x)  on  X ,  and  are  furthermore  guaranteed 
(although  they  may  be  somewhat  crude)  [8].  This  is  the  basis  of  the  self-validating  char¬ 
acter  of  interval  computation.  Furthermore,  interval  extensions  obtained  by  using  interval 
arithmetic  are  monotone  in  the  sense  that  X  C  Y  imples  that  F(X)  C  F(T).  In  exact 
arithmetic,  F  is  an  extension  of  /  in  the  sense  that  F([x,x])  =  /(x)  for  x  £  R  [8].  In 
what  follows,  x  will  be  used  to  denote  the  degenerate  interval  [x,x]  €  IR  as  well  as  the 

real  number  x.  Other  handy  notations  to  be  used  from  time  to  time  are 

» 

(3.4)  w(I)  =  w([a,6])  =• b  -  a,  m(7)  =  m([o,6])  = 

At 

for  the  width  and  midpoint,  respec  tively,  of  an  interval  1  £  IR. 

Just  as  in  the  case  of  differentiation  arithmetic,  interval  arithmetic  can  be  extended  to 
include  various  standard  functions  encountered  in  applications.  Efficient  implementations 
of  interval  arithmetic  and  interval  inclusions  of  standard  functions  are  now  available  in  a 
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number  of  computational  environments,  for  example,  Pascal-SC  for  microcomputers  and 
the  ACRITH  package  for  IBM  370  computers. 

The  space  IRn  of  interval  vectors  Y  =  (Yi,  Y2, . . . ,  Y„)  is  defined  in  the  same  way  as 
Rn,  and  the  notions  of  interval  matrices  and  vector  and  matrix-vector  interval  arithmetic 
arise  in  a  natural  way.  The  interval  scalar  product  of  interval  vectors  Y,  Y  is  defined  to  be 

(3.5)  *  •  v  =  I  £  v. 

''i  =  l 

and  the  notation  m(Y)  will  be  used  for  the  midpoint 

(3.6)  m( X)  =  (m( Y0, m( X2) . m(Xn)) 

of  the  interval  vector  Y. 

Now,  if  the  evaluation  of  the  function  (2.14)  is  performed  in  interval  arithmetic  with 
x\  =  ’0.9, 1.2],  X2  =  [0.8, 1.1],  then  the  result  is 

(3.7)  F(  X)  =  [0.0,41.0], 

where  Y  denotes  the  interval  vector  Y  =([0.9, 1.2],  [0.8, 1.1]).  This  means  that 

(3.8)  0  <  f{x)  <  41 

for  0.9  <  xj  <  1.2,  0.8  <  i2  <  1.1.  Thus,  the  bounds  (3.8)  are  obtained  automatically, 
simply  by  evaluation  of  (2.14)  in  interval  arithmetic,  in  much  the  same  way  that  values 
of  the  gradient  vector  and  Hessian  matrix  of  (2.14)  were  obtained  in  §2  by  the  use  of 
differentiation  arithmetic.  Furthermore,  as  stated  above,  the  bounds  given  by  interval 
arithmetic  are  guaranteed  to  be  valid. 

The  next  step  is  to  combine  the  differentiation  arithmetic  in  §2  with  interval  arith¬ 
metic.  An  element  T  of  type  1HESSIAN  will  be  a  triple 


X\  6  Y,  ,yt  6  Yi 


(3.9) 


T  =((/,£/',  17"), 


where  U  G  IR  is  an  interval,  U'  G  IR"  is  an  interval  (column)  vector,  and  U"  is  a  symmet¬ 
ric  interval  nxn  matrix.  The  resulting  set  of  elements  will  be  denoted  by  IHn.  Arithmetic 
operations  in  IHn  are  defined  by  (2.2)-(2.5),  with  the  operations  inside  the  parentheses 
replaced  by  the  interval  operations  (3.1).  Similarly,  operations  between  constants  c  G  IR 
and  elements  of  IH"  are  defined  by  (2.8)-(2.13)  and  the  corresponding  interval  operations. 
Real  constants  c  are  mapped  into  IR  by  c  •— >  [c,c],  as  before. 

For  example,  the  evaluation  of  (2.14)  a s  type  IHESSIAN  can  be  carried  out  over  the 
intervals  0.9  <  x\  <  1.2,  0.8  <  x-i  <  1.1  by  setting 


(3.10) 


*1 


12 


|0.9, 1.2 


/  [0, 0] 

V  [0.0]  /  \  [o,oj 


("""■(“MM 
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The  result  is 


*P0  =  (F(X),F'[X),F"(X))  = 

(3.11)  (  {{-139.4, 307.6] 

=  |  [0.0,41.0], 

■  128.0, 58.0j 


(534.0,1410.0]  [-480.0,-360.0] 

-480.0,  -360.0]  [200.0, 200.0] 


where  X  denotes  the  interval  vector  X  =  ([0.9, 1.2],  [0.8, 1.1]).  This  gives  not  only  the 
bounds  (3.8)  for  f(x)  over  X,  but  also  the  bounds 


(3.12) 


Vf(x)  €  F'(X)  = 


([-139.4,307.6] 
[-128.0,58.0] 


1 


for  the  gradient  vector  of  /,  and 


([534.0,1410.0]  [480.0,-360.0] 

[-480.0,  -360.0]  [200.0, 200.0] 

for  the  Hessian  matrix  of  /  over  X.  These  bounds,  obtained  automatically  by  the  use  of 
IHESSIAN  arithmetic,  are  guaranteed. 

In  addition  to  bounds  for  the  values  of  /  and  its  derivatives  on  X,  the  IHESSIAN 
computation  (3.11)  provides  information  about  the  continuity  of  /  and  V/  on  X.  For  an 
interval  I  =  [a,  6]  G  IR,  let 

(3.14)  |/|  | [a, 6] |  =  max{|a|,|6|}. 

If  X  -  ( Xi,X2,"-,Xn )  is  an  interval  vector,  then  ||X||  will  denote  the  quantity 

(3.15)  ||  AT|j  =  max|X,|, 

t 

and  for  an  n  x  n  interval  matrix  M  =  (A f,;),  let 

n 

(3.16)  ||  M ||  =  max^  |Mj,  |, 

;=i 

analogously  to  the  oo-norm  in  Rn  [8].  If  the  IHESSIAN  vf^lue  of  a  function  /  :  Rn  — *■  R 
over  X  €  IRn  is  denoted  by  $(X)  =  (F(X),  F'(X),  F"(X)),  then  the  existence  of  F^X) 
implies  that  /  is  Lipschitz  continuous  on  X,  and  L  =  ||F,(A)||  is  a  Lipschitz  constant  for 
/  on  X.  Similarly,  the  existence  of  F"{X)  implies  that  V/  is  Lipschitz  continuous  on  X, 
and  ||F"(A')||  is  a  Lipschitz  constant  for  V/  on  X.  Thus,  for  the  function  (2.14),  it  follows 
from  (3.11)  that 

(3.17)  \f(x)-  m\  <  307.6-  Ilx-ylloo, 
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and 


(3.18)  ||V/(x)  -  V/tyJIU  <  1890.0  •  ||x  -  y||oo 

for  x, y  £  X  =  ([0.9, 1.2],  [0.8,  l.l))  [14].  The  Lipschitz  continuity  of  V/  will  enter  into  the 
discussion  later. 

In  actual  practice,  the  arithmetic  operators  and  standard  library  functions  for  type 
1HESSIAN  can  be  programmed  once  and  for  all,  and  stored  in  a  small  subroutine  library, 
as  has  been  done  in  Pascal-SC  [13]. 

4.  Tests  for  Existence  or  Nonexistence  of  Critical  Points.  The  key  issue  in  the 
algorithm  described  in  this  paper  is  to  determine  if  a  region  in  Rn  defined  by  an  interval 
vector  X  £  IRn  contains  a  critical  point  of  the  function  /  :  Rn  — ►  R  or  not.  First  of  all,  if 

(4.1)  0*F'(X), 

then  it  is  impossible  that  V  f(x)  —  0  for  x  £  X,  and  X  can  be  rejected ,  since  it  does  not 
contain  a  critical  point  of  /  [7].  On  the  other  hand,  0  <  F'(X)  does  not  necessarily  mean 
that  X  contains  a  critical  point  of  /,  because  F'(X)  overestimates  V  f(X)  in  general.  The 
intersection  of  all  interval  vectors  containing  f[X)  is  called  the  interval  hull  of  f{X).  In 
several  dimensions,  the  interval  hull  of  f(X)  can  contain  points  outside  of  /(X)  in  general 
[81- 

In  addition  to  the  rejection  criterion  (4.1),  a  test  which  is  capable  of  establishing  the 
existence  of  a  critical  point  x*  in  X  is  necessary.  For  this  purpose,  the  test  given  by  Moore 
[m3]  will  be  used.  This  test  is  based  on  the  application  of  the  Krawczyk  transformation  K 

[5]  to  X: 

(4.2)  K(X)  =  x  ~  (Hf(x))-‘vnx)  +{1-  {Hf(x))-'F"(X)}(X  -  x), 

where  /  denotes  here  the  n  x  n  identity  matrix,  and  x  -=  m(X)  the  midpoint  vector  of  the 
interval  region  X.  The  real-valued  vectors  and  matrices  in  (4.2)  are  of  course  interpreted 
as  degenerate  interval-valued  objects. 

In  actual  practice,  K{X)  is  computed  by  solving  the  linear  system 

(4.3)  (Hf(x))  =  --  -V/(x)  +  {Hf( r)  -  F"(X)}(X  -  x) 
for  5,  from  which 

(4.4)  K{X)  =  x-t~. 

Once  K[X)  has  been  computed,  one  of  the  following  alternatives  holds.  If 

(4.5)  K{X)  C  X, 
then  there  exists  a  critical  point  x*  €  /f(X)  of  /;  if 

(4.6)  K(X)  n  X  =  0, 
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is  empty,  then  X  does  not  contain  a  critical  point  of  /  (another  rejection  criterion);  oth¬ 
erwise,  the  test  is  inconclusive  [7]. 

With  regard  to  (4.6),  the  intersection  of  interval  vectors  X,  Y  is  said  to  be  empty  if 
for  some  *,  X{  and  V,  are  disjoint  intervals.  It  also  follows  from  (4.2)  that  if  x*  €  X  is  a 
critical  point  of  /,  then  x*  €  K[X).  Thus,  in  the  inconclusive  case,  the  region 

(4.7)  Z  =  X  n  K{X)  C  X 

will  also  contain  any  critical  points  of  /  which  lie  in  X.  This  suggests  decomposing  Z  (which 
may  be  equal  to  X)  into  several  subregions,  and  applying  the  above  tests  to  each.  The 
resulting  algorithms,  described  in  more  detail  below,  are  essentially  modifications  of  the 
one  given  by  Moore  and  Jones  [9]  for  locating  solutions  of  systems  of  nonlinear  equations 
in  several  variables.  These  algorithms  differ  from  the  Moore-Jones  method  in  that  they 
make  use  of  the  fact  that  an  optimization  problem  underlies  the  system  of  equations  being 
solved,  which  provides  information  additional  to  that  inherent  in  an  arbitrary  system  of 
nonlinear  equations.  Furthermore,  the  algorithms  given  here  differ  by  bisecting  intersected 
intervals  in  the  inconclusive  case,  which  results  in  a  certain  amount  of  increase  in  speed. 

The  use  of  subregions  has  the  advantage  that  the  tests  (4.1)  and  (4.5)-(4.6)  become 
more  sensitive  as  the  size  of  the  legion  decreases,  that  is,  as  HufpQHoo  — 1 ►  0.  In  fact,  if 
x *  is  a  regular  critical  point  of  /.  that  is,  if  (H  f(x*))~l  exists,  then  (4.5)  will  hold  for 
sufficiently  small  X  such  that  x*  €  X  if  V/  is  Lipschitz  continuous  [10].  The  disadvantages 
are  the  extra  bookkeeping  and  storage  required  for  pending  subregions.  However,  these 
are  not  overwhelming  on  modern  computers. 

5.  Implementation  of  the  Krawczyk  Transformation.  The  system  of  equations 
(4.3),  as  stated,  has  the  real  coefficient  matrix  Hf( z),  interpreted  as  a  degenerate  interval 
matrix,  and  an  interval  right  side.  In  actual  computation,  instead  of  solving  (4.3),  one 
obtains  an  inclusion  E  of  the  solution  Y  of  the  system 

(5.1)  F"(z)Y  =  ~F'(z)  +  {F"(x)  -  F"(X)}(X  -  x), 

which  has  an  interval  coefficent  matrix  and  an  interval  right  side.  The  solution  of  such  a 
system 

(5.2)  AY  =  B 
is  defined  to  be 

(5.3)  Y  =  {y  \ay  =  b,  a  €  A,  b  e  f?}, 

where  a  is  a  real  matrix,  and  6,y  €  Rn,  provided  all  the  indicated  real  systems  are  solvable. 
In  this  case,  it  follows  that  E  C  Y  C  E,  where  E  is  the  solution  of  (4.3),  and  thus 

K(X)  =  x  +  SCx+E  =  K{X). 


k(x)  c  x 


(5.4) 

Furthermore, 

(55) 
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or 


(5.6)  K{X)  n  X  -  0 

thus  imply  (4.5)  or  (4.6),  respectively.  In  this  way,  existence  or  nonexistence  of  a  critical 
point  of  /  in  X  can  be  established  conclusively  by  a  computation  done  in  floating-point 
interval  arithmetic  by  a  computer,  providing  it  is  possible  to  obtain  an  inclusion  for  the 
solution  of  the  system  (5.1).  In  actual  practice,  the  widths  of  the  components  of  F"(x) 
will  be  small,  and  good  inclusions  of  Y  can  be  obtained  by  a  process  of  floating-point 
approximation  followed  by  interval  iterative  refinement  [15],  which  will  fail  only  if  FM(x) 
contains  a  singular  or  very  badly  conditioned  matrix.  Interval  linear  system  solvers  of  this 
type  are  available  in  Pascal-SC  and  the  ACRITH  package. 

6.  The  Basic  Algorithm.  The  algorithm  described  in  this  section  will  find  one  or  all 
the  critical  points  of  a  function  /  in  a  given  initial  region  X,  or  show  that  X  contains 
no  critical  points,  provided  no  exceptions  arise.  Exceptions  will  be  discussed  in  a  later 
section.  The  computer  program  implementing  this  algorithm  handles  exceptions  in  such 
a  way  that  the  computation  always  terminates  in  a  finite  number  of  steps.  Validated 
upper  and  lower  bounds  are  given  for  all  critical  points  and  values  found.  The  algorithm 
will  be  presented  first  for  the  case  of  a  single  processor,  in  the  way  it  has  actually  been 
implemented.  Adaptation  to  a  multiprocessor  environment  will  be  discussed  at  the  end  of 
the  section. 

The  basic  steps  of  the  algorithm  are: 

1°.  Compute  $(X)  =  (F(X),F'(X),F"(X))  in  IHESSIAN  arithmetic.  IfO£F'(X), 
then  X  is  rejected. 

2°.  Compute  4>(x)  =  (F(x),  F'(x),  F"(x))  in  IHESSIAN  arithmetic.  Compute  an 
inclusion  K(X)  of  the  Krawczyk  transformation  of  X  by  (5.1). 

3°.  If  T(X)  C  X,  then  the  interval  iteration 

(6.1)  x°  =  x,  r+1  =  /f(r)nr 

is  performed  until  it  converges  to 

(6.2)  X'  =  XNCXN*\ 

in  a  finite  number  of  steps  12'.  The  values 

(6.3)  X*,  *(X‘)  =  (F(X*),F'(X*),F"(X*)), 

are  output.  The  existence  of  a  critical  point  x*  of  /  in  X  is  guaranteed,  and  furthermore 
the  bounds 

(6.4)  x*  £  X*,  (/(x*),  V/(x*),///(x*))  £  (F(X*), F'(X*), F"(X*)), 

for  x*  and  the  values  of  the  function  /,  its  gradient  vector  V /,  and  its  Hessian  matrix  Hf 
at  x*.  These  bounds  are  usually  as  good  as  can  be  obtained  by  floating-point  computation, 
and  F"(.Y*)  can  be  used  to  determine  the  nature  of  the  critical  point  x*,  if  necessary. 
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4°.  If 

(6.5)  K{X)  DX  0, 
then  X  is  rejected. 

5°.  In  the  indeterminate  case,  the  region 

(6.6)  Z  =  (Z1,Z2,...,Zn)  =  k(X)nX 


is  bisected  in  the  direction  of  its  widest  component.  An  index  j  is  determined  such  that 


(6.7) 


w(Zj)>w(Zi),  i  =  1 , 2. ....  n, 


and  one  takes 


(6.8) 


Z1  —  (Zi, . . .  ,Zj-i,  [inf  Zj,  m(Zj)\,  Zj+i, . . . ,  Z„), 
Zr  =  (Z i, . . . , Zj- 1,  \m(Zj), sup(Zj)],  Zj+i, . . . ,  Zn). 


6°.  (Single  processor)  Compute  ${Zl),  $(Zr).  The  test  (4.1)  is  applied  to  F'(Zl)  and 
F’(Zr).  There  are  four  cases: 

(i)  If  0  g  F'{Zl)  and  0  g  F'{Zr),  then  X  is  rejected; 

(ii)  If  0  €  F'(Zl )  but  0  £  F'(Zr),  then  return  to  step  2°  with  X  =  Zl\ 

(iii)  If  0  £  F'(Zl)  but  0  e  F'(ZT)y  then  return  to  step  2°  with  X  =  ZT\ 

(iv)  If  0  6  F'(Zl )  and  0  €  F'(ZT ),  then  one  of  the  interval  vectors  is  to  be  placed  on  a 
push-down  (last  in,  first  out)  stack  in  storage,  while  the  other  replaces  X  for  continued 
processing  at  step  2°.  In  this  algorithm,  the  choice  is  made  by  the  following  heuristic: 
If  w(F(Z1))  <  w(F(Zr)),  then  one  takes  X  =  Zl  and  stacks  Zr  for  processing  later; 
otherwise,  one  takes  X  =  Zr  and  stacks  Zl. 

The  region  selected  for  X  is  considered  to  be  “more  promising”  than  the  one  stacked 
because  the  variation  of  a  function  in  the  neighborhood  of  a  critical  point  is  asymptotically 
less  than  it  is  elsewhere.  The  goal  is  to  find  critical  points  as  quickly  as  possible,  particularly 
if  only  one  is  desired. 

6°.  (Multiprocessors)  Zl  is  sent  to  another  processor  following  the  bisection  (6.8). 
Return  to  step  1°  with  the  current  processor  taking  X  =  Zr ,  while  the  other  takes  X  =  Zl. 
If  no  processors  happen  to  be  free,  then  Zl  circulates  or  is  put  on  a  common  stack  to  await 
the  first  available  processor.  If  a  number  of  processors  are  free,  it  could  be  expeditious  to 
decompose  X  into  more  than  two  subregions,  and  then  send  one  to  each  processor.  The 
choices  here  will  depend  to  a  great  extent  on  the  multiprocessor  configuration  actually 
used. 

In  the  case  of  a  single  processor,  the  choice  of  which  interval  to  stack  in  step  6° (tv) 
will  be  modified  in  the  case  global  critical  maxima  or  minima  are  sought.  If  only  one 
critical  point  is  sought,  the  algoril  hm  is  terminated  at  the  end  of  step  3°.  Otherwise,  the 


891 


algorithm  can  continue  until  all  critical  points  of  /  in  X  are  found,  and  no  regions  remain 
on  the  stack  to  process.  Complete  processing  of  X  wilhout  finding  critical  points  proves 
that  it  contained  none. 

Because  the  processes  of  intersection  and  bisection  can  result  in  subregions  of  a  wide 
range  of  sizes,  simple  count  of  the  number  processed  at  any  given  time  does  not  give  a 
good  indication  of  the  progress  being  made  by  the  algorithm.  For  this  reason,  it  has  been 
found  convenient  to  compute  the  initial  volume 


n 

(6.9)  Vo  =  1]  w{Xt) 

i- 1 

of  the  region  X  =  (A'i,  X?, . . . ,  Xn)  to  be  searched.  The  unexplored  part  of  the  initial 
region  has  volume  Fu(t)  at  time  t,  where  V (0)  =  Vq.  Fu(t)  can  be  computed  simply  as 
the  sum  of  the  volumes  of  the  intervals  being  processed  and  those  on  the  stack  awaiting 
processing  at  time  t.  Vu(t)  is  a  monotone  decreasing  function  of  t,  and  the  algorithm 
terminates  when  the  stack  is  empty  and  Vu{t)  —  0,  if  an  exhaustive  search  is  desired. 

7.  Exceptions.  Several  exceptions  can  arise  in  the  execution  of  the  algorithm  in  §6  which 
could  terminate  the  computation  prematurely,  or  cause  it  to  run  indefinitely.  These  and 
the  way  they  are  handled  will  be  discussed  now,  because  they  may  al  o  occur  in  the  search 
for  global  critical  extrema. 

1°  ]{-  f»(xj  contains  a  singular  or  badly  conditioned  matrix,  then  the  attempt  to 
perform  the  Krawczyk  transformation  by  solving  (5.1)  will  fail.  One  solution  is  to  replace 
F"(x)  by  some  nonsingular  matrix,  for  example,  m(F"(  V))  could  work  [9].  The  implemen¬ 
tation  used  for  the  examples  given  below  simply  outputs  X  to  a  file  for  later  examination, 
with  an  appropriate  message,  and  then  selects  the  next  region  to  be  processed  from  the 
stack. 

2°.  The  intersection-bisection  process  can  lead  to  regions  which  do  not  differ  from 
the  previous  ones,  because  of  outward  rounding,  or  which  are  so  small  that  total  time 
to  explore  the  entire  region  is  prohibitive.  This  can  happen,  in  particular,  if  a  critical 
point  lies  exactly  on  a  bisection  coordi  ate.  For  this  reason,  the  user  is  provided  with  a 
parameter  t  such  that  if  the  volume  V  <  the  region  to  be  processed  satisfies 

(7.1)  V  <  ( •  V0, 

then  the  region  will  be  output  to  a  file  for  later  examination,  with  an  appropriate  message. 

The  choice  c  =  0  is  permitted;  this  allows  the  processing  of  smaller  and  smaller  regions 
until  their  volume  (6.9)  underflows  to  0  or  some  coordinate  becomes  degenerate. 

3°.  If  the  storage  space  allotted  to  the  stack  is  full,  then  additional  regions  will  be 
output  to  a  fiie. 

4°.  Numerical  exceptions,  such  as  division  by  zero  and  overflow,  are  allowed  to  termi¬ 
nate  the  present  program.  However,  they  could  be  used  as  signals  to  output  the  offending 
region  to  a  file  with  an  appropriate  message. 
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Successful  termination  of  the  computer  program  will  be  accompanied  with  a  list  of 
the  number  of  regions  processed  (given  by  the  number  of  times  the  Krawczyk  transforma¬ 
tion  was  performed),  the  number  rejected,  and  the  number  of  critical  points  found,  and 
the  number  output  to  the  exception  file.  Given  all  its  critical  points,  the  global  critical 
maximum  and  minimum  of  the  function  can  be  found  simply  by  sorting  the  function  values. 

8.  Global  Critical  Extrema.  The  algorithm  of  §6  can  be  speeded  up  if  only  the  global 
critical  maximum  or  mininum  of  /  on  X  is  desired.  The  modification  of  the  algorithm  to 
find  the  global  critical  minimum  will  be  described;  finding  the  global  critical  maximum 
follows  exactly  the  same  pattern. 

Suppose  first  that  the  global  critical  minimum  of  /  on  X  is  actually  its  global  minimum 
on  X,  that  is,  f(x)  >  m  =  f(x*)  for  x  £  X,  where  x*  is  the  global  critical  minimum  point. 
The  algorithm  will  compute  a  decreasing  sequence  of  upper  bounds  m(t)  for  m,  and  reject 
subregions  Z  such  that  inf  F[Z)  >  m(<).  The  modifications  of  the  corresponding  steps  for 
the  single-processor  algorithm  are: 

1°.  Set  m(0)  =  MAXREAL,  the  largest  floating-point  number  (for  example,  in  Pascal- 
SC,  MAXREAL=9.99999999999xl0"). 

2°.  If  supF(x)  <  m(t  -  1),  then  set  m(f)  =  supF(x),  otherwise,  m(t)  =  m(t  —  l). 

6°.  Reject  Zl  if  0  £  F'(Zl)  or  inf  F(Zl)  >  m{t)\  reject  Zr  if  0  g  F'{Zr)  or  inf  F[ZT)  > 
m{t).  If  neither  Zl  nor  ZT  can  be  rejected,  then  Zl  is  considered  to  be  more  promising 
if  inf  F(Zl)  <  inf  F[Zr),  and  ZT  is  stacked,  or  conversely.  In  case  inf  F[Zl)  =  inf  F(Zr ), 
then  Zr  is  stacked  if  supF(i?r)  >  sup  F(Zl),  otherwise,  Zl  is  stacked. 

Considerable  savings  in  computer  time  have  been  observed  due  to  the  introduction  of 
the  additional  rejection  conditions  in  6°.  In  the  case  of  multiprocessors,  an  efficient  way  to 
share  the  current  value  of  m(t)  is  necessary,  and  the  rejection  of  regions  in  which  function 
values  are  too  large  would  be  carried  out  in  step  1°. 

In  case  the  function  /  can  attain  smaller  values  than  m  on  the  boundary  dX  of  X  at 
points  which  are  not  critical,  an  alteration  has  to  be  made  in  the  above  procedure.  The 
value  of  m(t)  is  updated  only  when  regions  X *  containing  critical  points  x*  are  computed 
by  interval  iteration.  If  supf’(x*)  <  m(t  -  1),  then  we  set  m{t)  =  supF(x*),  otherwise 
m(t)  =  m(t- 1).  The  rejection  criterion  inf  F(Z)  >  m[t)  remains  unaltered.  The  algorithm 
will  generally  be  slower  than  the  one  given  above  in  this  case,  but  usually  still  faster  than 
an  exhaustive  search  for  all  critical  points. 

In  the  same  way,  a  function  M(t)  giving  the  lower  bound  for  the  global  maximum 
M  of  /  is  constructed  by  setting  A/(0)  =  -MAXREAL,  and  updating  by  M(t)  to  be 
the  maximum  of  M(t  -  1)  and  infF(x)},  assuming  that  the  global  maximum  is  critical. 
Intervals  are  rejected  if  supE(X)  <  M{t).  Otherwise,  M(t)  is  updated  only  at  critical 
points,  as  above.  The  modification  of  the  choice  algorithm  for  bisected  intervals  is  done 
by  reversing  inequality  signs  and  interchanging  infs  with  sups  in  the  above. 

0.  Use  of  the  Algorithm  for  Validation.  In  addition  to  its  use  for  global  searching, 
the  algorithm  given  in  §6  can  be  used  to  validate  solutions  to  optimization  problems  given 
by  other  algorithms.  For  example,  suppose  that  x  is  an  approximate  critical  point  of  / 
found  by  Newton’s  method  or  some  other  numerical  technique.  Then,  the  initial  region  X 
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can  be  taken  to  be,  say 


X  =  x  +  ^  •  e  •  [--1, 1], 


(9.1) 

where  6  >  0  and  t  =  (1, 1, . . . ,  1)  £  Rn  denotes  tie  vector  with  all  components  equal  to  one. 
If  (5.5)  holds  for  this  value  of  X,  then  all  components  of  x  are  validated  to  be  accurate  to  a 
tolerance  of  6.  Furthermore,  the  interval  iteration  (6.1)  will  give  approximations  of  possibly 
increased  accuracy  for  the  critical  point,  as  well  as  bounds  the  for  function,  gradient  vector, 
and  Hessian  matrix  values.  It  should  be  noted  that  the  interval  calculations  furnish  upper 
and  lower  bounds  for  maximum  and  minimum  values  of  the  function,  while  some  other 
methods  give  only  one-sided  bounds. 

10.  Numerical  Examples.  The  functions  selected  for  numerical  computation  were  the 
n-dimensional  Rosenbrock  function  [1], 

n—  1 

(10.1)  /„(*)  -  £[l00(zl+1-z?)2  +  (1  -  *.-)*], 

»=i 

and  the  “three-humped  camel”  function  [3], 

(10.2)  g[x)  —  2xJ  -  1.05ij  +  ^x^  -  X]X2  +  x\. 

6 

The  program  used  was  written  in  Pascal-SC  for  a  microcomputer  with  a  Z80  processor 
and  the  CP/M  operating  system.  This  was  done  to  take  advantage  of  support  for  interval 
arithmetic,  an  already  written  library  of  operators  and  functions  for  type  IHESS1AN,  and 
the  utility  procedure  LGLI  for  solving  linear  systems  with  interval  coefficient  matrices  and 
right  sides.  On  the  other  hand,  the  small  amount  of  storage  available  in  this  machine  (64 
kilobytes)  limited  the  values  of  n  for  the  Rosenbrock  function  (10. 1)  to  n  =  2,3.  The 
actual  machine  used  was  also  rather  slow,  with  a  1MHz  system  clock,  giving  typical  times 
for  floating-point  interval  addition  and  subtraction  of  13.5  milliseconds,  multiplication, 
57.5  milliseconds,  and  division,  77.5  milliseconds.  Nevertheless,  the  results  given  below 
were  obtained  in  a  reasonable  amount  of  time. 

The  most  time-consuming  part  of  the  computation  is  the  performance  of  the  Krawczyk 
transformation  K(X)  (actually,  K{X)),  using  the  Pascal-SC  utility  program  LGLI  to  solve 
the  system  (5.1)  with  interval  coefficient  matrix  and  light  side.  A  count  is  made  of  the 
number  of  times  this  transformation  is  performed,  the  number  of  critical  points  found, 
the  number  of  regions  rejected,  and  the  number  of  regions  (if  any)  in  which  exceptions 
arc  encountered.  The  sum  of  the  number  of  regions  rejected  (which  cannot  contain  crit¬ 
ical  points),  the  number  of  regions  in  which  critical  points  are  found,  and  the  number  of 
exceptional  regions  gives  the  total  number  of  subregions  examined.  The  Krawczyk  trans¬ 
formation  may  be  applied  to  a  given  region  several  times  before  it  is  accepted  as  containing 
a  critical  point,  or  rejected. 

The  Rosenbrock  function  (10.1)  has  the  global  minimum  /n(z*)  =  0  at  the  critical 
point  x*  e  (1,1,...,!).  It  is  easy  to  find  x*  by  Newton’s  method,  but  methods  which 
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try  to  reduce  /n(x)  at  each  step  find  this  function  rather  difficult,  particularly  in  higher 
dimensions.  Four  cases  were  considered: 

1.  »  =  2,  X0=  ([0.9, 1.2], [0.8, 1.1]); 

2.  n  =  2,  X0  -  ([-3.7, 1.4],  [  1.6, 3.5]); 

3.  n  =  3,  X0=  ([0.9, 1.2],  |0.8, 1.1],  [0.9, 1.2]); 

4.  n  =  3,  X0  =  ([-3.7, 1.4], [-1.6, 3.5], [-3.7, 1.4]). 

The  value  c  =  0  was  taken  in  each  case,  and  no  exceptions  occured  in  any  of  the 
examples  given  here.  Searching  for  all  critical  points  gave  the  following  results: 


Case  1  Case  2  Case  3  Case  4 


Transformations 

242 

957 

553 

1976 

Rejected 

167 

685 

432 

1567 

Critical  Points 

1 

1 

1 

1 

Transformations  to  Locate 

128 

187 

328 

1672 

The  algorithm  was  very  busy  in  the  neighborhood  of  the  critical  points  x*  =  (1,  l)  and 
x*  =  (1,1,1).  The  region  in  which  (5.5)  holds  turned  out  to  be  rather  small,  and  nearby 
regions  not  containing  x *  had  to  be  made  very  small  before  they  could  be  rejected  with 
certainty.  The  increase  in  area  of  Xo  by  a  factor  of  289  between  cases  1  and  2  increased 
the  number  of  Krawczyk  transformations  required  by  a  factor  of  less  than  four,  while  the 
increase  in  volume  of  Xo  between  cases  3  and  4  by  a  factor  of  4913  resulted  in  an  even 
smaller  increase  in  the  number  of  transformations,  less  than  3.6.  Going  from  two  to  three 
dimensions  increased  the  number  of  transformations  required  to  search  the  entire  initial 
region  by  a  factor  of  about  two  in  each  case. 

The  modification  of  the  program  to  search  for  a  global  critical  minimum  gave  the 
following  results: 


Case  1  Case  2  Case  3  Case  4 


Transformations 

92 

140 

346 

219 

Rejected 

75 

110 

169 

172 

Critical  Points 

1 

1 

1 

1 

Transformations  to  Locate 

83 

103 

330 

201 

These  results  show  a  considerable  improvement  over  the  search  for  all  critical  points.  Once 
the  global  minimum  value  has  been  found,  remaining  regions  are  generally  rejected  quickly. 
The  algorithm  was  very  effective  for  the  largest  problem  considered,  Case  4  above. 

The  required  increase  in  minimum  function  values  in  the  search  for  a  global  critical 
maximum  forced  the  algorithm  toward  the  boundary  of  Xo,  where  regions  were  quickly 
rejected.  The  corresponding  results  were: 
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Case  1 

Case  2 

Case  3 

Case  4 

Transformations 

9 

16 

27 

37 

Rejected 

9 

16 

28 

38 

Critical  Points 

0 

0 

0 

0 

In  the  last  two  cases,  the  final  regions  were  rejected  without  having  to  perforin  a  Krawczyk 
transformation. 

In  two  dimensions,  the  interval  iteration  to  the  critical  point  x*  converged  to 


(10.3)  X*  =  ([0.999999999999,1.00000000001],  [0.999999999999,1. 


ilium  1 1 1 


H) 


with 


(10.4) 


F2[X *)  =  [0.00,9.62  x  10  20], 


(10.5) 


4.802  X  10  ~9, 1.242  X  10“8]  \ 
-6.200  X  10~9, 2.400  x  10“9]  )  ’ 


(10.6) 

F%{X*) 


/  [801.999999987,802.000000030]  [-400.000000004,-399.999999998] 

y  [  —  400.000000004,  —399.999999998]  200 


The  midpoint  of  X*  was  calculated  to  be  x  —  (1, 1),  with  F2(x)  =  0,  F^x)  =  (o),and 

.  (  802  -400  \  ,  .  .  ...... 

2(x)  =  I  400  200  ) 1  W'11C"  are  validated  to  be  the  exact  values 


the  Hessian  matrix  F 


802  -400 

400  200 

of  z*,  V/2(x*),  and  H  f2{x')  by  the  above. 

The  results  in  three  dimensions  were  completely  similar,  with  the  midpoint  £  =  (1, 1, 1) 


of  X*  giving  the  exact  values  /3(x)  0,  V /3(x) 


-(:)• 


and 


/  802  -400  0  \ 

(10.7)  Hf3(x)  =  -400  1002  400  1  . 

\  0  -400  200  J 

The  three-humped  camel  function  g(x)  given  by  (10.2)  has  five  critical  points: 

(10.8)  x*  =  (0,0), 
which  is  its  global  minimum  point, 

(10.9)  ±y*  =  ±(y2.\  -  v/b‘865,  ^\Z2^1  -  \/0.865), 
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which  are  saddle  points,  and 


(10.10)  1:2*  --  l(^2.1  1  >/0.865,  ^  \J 2.1  1  v/0.865^, 
which  are  relative  minimum  points.  One  has 

(10.11)  o  =  <700  <  9(-n  =  *(**)  <  g(-y*)  =  y(v*). 

The  search  for  all  critical  points  was  conducted  in  the  inital  region 

(10.12)  X0  ((-2.0, 1.8],  [  -0.9, 1.0]), 

with  e  =  0.  The  total  number  of  Krawczyk  transformations  required  was  82,  48  intervals 
were  rejected,  all  5  critical  points  were  found,  and  there  were  no  exceptions.  This  function 
is  less  of  a  computational  challenge  than  the  Rosenbrock  function  /^(z);  however,  the 
critical  points  ±y*  and  ±z*  tend  to  shield  x*  from  straightforward  iterative  procedures, 
such  as  Newton’s  method.  Letting  T(-)  denote  the  number  of  Krawczyk  transformations 
required  to  locate  a  given  critical  point,  the  results  were: 

T{x')  =  6, 

T(y')  =  21, 

(10.13)  T(z*)  =  45, 

T{-y*)  =  69, 

T{-z*)  =  79. 


The  search  for  the  global  minimum  of  y(z)  in  X0  required  60  Krawczyk  transfor¬ 
mations,  46  intervals  were  rejected,  2  critical  points  were  located  in  order  of  decreasing 
function  value,  and  there  were  no  exceptions.  The  critical  points  found  were  first  —z*  and 
then  the  global  minimum  point  x*,  with 

T(-s*)  =  10, 

T(x*)  =  57. 

Obviously,  the  search  took  an  entirely  different  path  than  the  exhaustive  search  (10.13) 
for  all  critical  points  of  g(x)  in  Xq- 

The  search  for  the  global  critical  maximum  of  g(x)  was  somewhat  slower,  due  to  the 
fact  that  y(x)  attains  it  maximum  at  a  noncritical  point  on  the  boundary  HX o  of  Xo. 
Consequently,  the  function  M(t)  which  gives  a  lower  bound  for  the  critical  maximum  was 
updated  only  at  critical  points.  This  computation  required  76  Krawczyk  transformations, 
50  intervals  were  rejected,  and  three  critical  points  (x*,  y*,  and  — y*)  were  found  in  order 
of  nondecreasing  function  values.  The  number  of  transformations  required  were: 

T(x*)  =  6, 

(10.15)  T(y*)  =  21, 

T(-y*)  =  64. 


(10.14) 
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The  critical  points  ±2*  were  also  located,  but  rejected,  because  the  values  of  g(x)  at 
these  relative  minimum  points  is  smaller  than  at  the  saddle  points  ±y*.  Investigation  of 
the  nature  of  critical  points  is  done  by  finding  bounds  for  the  eigenvalues  of  all  symmetric 
matrices  A  such  that  A  £  G"(X*),  using  the  Pascal-SC  procedure  EIGEN.  Denoting  the 
eigenvalues  of  a  2  x  2  symmetric  matrix  A  by  Ai(.4)  and  A2(^4),  then  intervals  Aj(.Y*)  and 
A2(.X'*)  are  computed  by  EIGEN  such  that 

(10.16)  {A,(>1)  |  A  6  G"(X*) }  C  Ai(X*).  i  =  1,2. 

The  character  of  the  critical  point  x*  £  A'*  can  be  decided  on  the  basis  of  these  bounds. 
The  results  of  the  interval  iteration  to  critical  points  were: 


(10.17) 


X*  =  ((-2.0  x  10  '",2.0  x  10  '"],  [—2.0  x  10~",2.0  x  10-")), 
G(X*)  [-2.05  x  10-  99, '  .00  x  10'"], 

L52xl°  98 > 152  >  10-98j\ 

[  ’  \  [-6.00  x  10-", 6.00  >  10'")/  ’ 

(  1 3. 99999999999, 4 .00000000000]  -l\ 

MA)i'  -1  2  )■ 


The  eigenvalues  of  Hg(x')  are  contained  in  the  intervals 


(10.18) 


Ai(X’)  =  [4.142135623,4.1421:556  25], 
A2(A")  [1.58578643759,1.58578643766], 


which  proves  conclusively  that  the  critical  point  x*  £  A*  is  a  minimum  point,  because 
both  eigenvalues  of  Hg(x*)  €  G"(A'*)  must  be  positive  [  1  ] .  The  midpoint  of  X*  is  x  = 

x *  =  (0,0),  with  G(x)  =  y{x *)  =  0,  G'[x)  =  Vg(x*)  =  (o)’  and  G"^  =  = 

v) 

Next, 


(10.19) 


Y*  =  ([1.07054229181, 1.07054229185],  [0.535271145904,0.535271145921]), 
G(Y *)  =  [0.877361557501,0.877361558041], 


G\Y*)  =  ^ 
G"(Y*)  =  ^ 


[-5.81  x  10-10,5.26  x  10“10]  \ 
[-5.00  x  10  ll,4.00  X  10_n]  J  ’ 

[-3.87308929305,  3.87308929080] 
-  1 


The  eigenvalues  of  Hg[y*)  are  contained  in  the  intervals 


(10.20) 


A,(y*)  =  |  4.03868818,-4.03868818], 
A2(K*)  =  [2.165598876,2.165598884], 
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so  that  the  critical  point  y*  G  Y*  is  indubitably  a  saddle  point. 

The  next  values  obtained  were 

=  ([1.71755234581,1.74755234586]), 

=  [0.298638440884,0.298638443572], 

_  /  [-2.18  x  10-10, 3.82  x  10~JO]\ 

\  [-1.00  x  10" n, 0.00]  /  ’ 

( [1.21530892921,1.21530892930]  -1  \ 

'V  -i  -v* 

are  contained  in  the  intervals 

Aj  =  5.33440787,5.33440793], 

A2  =  2.681868136,2.681868143], 

z*  is  a  relative  minimum  point. 

The  computed  intervals 

(10.23) 

-  Y *  =  ([ ■ - 1 .070544229185,  - 1 .07054229181] ,  [ -0.535271 145923,  -0.535271 145906]) 

and 

(10.24) 

-Z*  =  ([-1.74755234585, -1.74755234580],  [-0.873776172925, -0.873776172902]) 

contain  the  critical  points  -y*  and  —  z*,  respectively.  The  function,  gradient,  and  Hessian 
values  on  these  intervals  do  not  differ  significantly  from  the  corresponding  ones  for  Y* 
and  Z*.  In  particular,  the  eigenvalues  of  Hg(-y*)  lie  in  the  intervals  (10.20),  while  the 
eigenvalues  of  Hg(—z*)  belong  to  the  intervals  (10.22).  Thus,  —y*  is  guaranteed  to  be  a 
saddle  point,  and  —  z*  a  relative  minimum  point  of  g. 


Z * 
G(Z') 

(10.21)  G'[Z*) 
G"(Z*) 

The  eigenvalues  of  Hg(z *) 

(10.22) 

and  thus  the  critical  point 
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ABSTRACT :  In  this  paper,  we  present  an  algorithm  for  computing 
k-terminal  reliability  in  directed  or  undirected  networks.  The 
running  time  of  the  algorithm  is  bounded  by  a  polynomial  in  the 
number  of  (s ,K) -quasicuts ,  where  an  (s , K) -quasicut  is  defined  to 
be  a  minimal  (s,K*)-cut  for  some  K*  contained  in  K.  The 
algorithm  can  be  implemented  by  applying  our  cut  based  2-terminal 
algorithm  to  a  network  to  which  a  super-sink  node  has  been  added. 
Thus,  a  2-terminal  code  based  on  our  cut  based  algorithm  could  be 
used  to  solve  the  k-terminal  problem. 

1.  INTRODUCTION:  We  present  a  new  algorithm  for  computing  the  k- 
termlnal  network  reliability  measure.  The  algorithm  is  a 
generalization  of  an  algorithm  we  previously  described  in  (Provan 
and  Ball  1984)  for  the  2-terminal  reliability  measure.  The  2- 
terminal  algorithm  was  bounded  by  e  polynomial  in  the  size  of  the 
network  and  the  number  of  minimal  (s,t)-cuts.  An  analogous 
result  for  the  k-terainal  problem  would  be  an  algorithm  bounded 
by  a  polynomial  in  the  size  of  the  network  and  the  number  of 
minimal  (a, K) -cuts.  We  showed  in  (Proven  and  Ball  1984)  that  the 
existence  of  such  an  algorithm  Is  unlikely  since  it  would  imply 
that  P-NP.  The  algorithm  given  in  this  paper  has  a  time  bound 
which  is  polynomial  in  the  size  of  the  network  and  the  number  of 
(s ,K) -quasicuts ,  which  are  arc  sets  closely  related  to  ninimal 
(s.K)-cuts.  A  desirable  feature  of  this  algorithm  and  its 
development  is  that  the  algorithm  can  be  realized  by  applying  the 
2-termlnal  algorithm  to  a  graph  in  which  a  super  sink  node  has 
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been  added.  Thus,  a  2-terminal  code  based  on  our  cut  procedure 
could  be  used  to  solve  the  k-termlnal  problem. 


2.  PRELIMINARY  DEFINITIONS :  For  network  reliability  problems, 

we  are  given  an  underlying  network  G-(N,A),  either  directed  or 

undirected,  with  node  set  N-{vi,V2 . vm)  and  arc  set 

A-{ei,e2 . en).  An  arc  ls  denoted  by  (vj^vj),  which  is  taken 

to  be  an  ordered  or  unordered  pair  depending  on  whether  G  is 
directed  or  undirected.  Ue  assume  that  each  arc  e€A  fails 

independently  with  known  failure  probability  pe .  Our  reliability 
measures  are  defined  by  the  existence  of  operating  oaths,  paths 
of  operating  arcs,  between  certain  pairs  of  nodes  in  G.  Ve 
define,  for  nodes  s  and  t  of  G,  the  event 

EP(s,t)  -  [there  exists  an  operating  path  from  s  to  t]. 

The  path  is  taken  to  be  directed  when  G  is  directed,  and  there  is 
always  an  operating  path  from  s  to  itself.  The  problem  of 
computing  Pr [ EP( s , t ) ]  is  called  the  (s.t) -  connectedness  problem 
or  the  2-termlnal  reliability  problem.  We  extend  this  definition 
to  a  general  termini?  1  set  as  follows.  Given  a  K£N  and  an  seK, 
define  the  event 

EP(s,K)  -  [there  exists  an  operating  path  from  s  to  every 
node  of  K] 

-  Ut€K  EP ( s , t ) 

The  problem  of  computing  Pr[EP(s,K>]  is  called  the  ( a  .  K)  - 
gpnnsctldnggg  pr<?t>l<?rc  or  the  k-termlnal  reliability  problem  where 
k-|K|.  When  K-N ,  it  is  also  called  the  all-terminal  reliability 
problem . 

Each  of  the  measures  given  above  has  the  property  that  the 
underlying  system  is  coherent  with  respect  to  that  measure;  that 
is,  if  the  system  operates  when  a  set  S  of  components  (arcs) 
operates,  then  it  operates  when  any  superset  of  S  operates. 
Coherent  systems  can  be  described  completely  by  listing  either 
(1)  the  collection  of  minimal  sets  of  components  whose  operation 
allows  system  operation,  or  (2)  the  collection  of  minimal  sets  of 
components  whose  failure  causes  system  failure.  We  call  these 
sets,  respectively,  the  pathsets  and  cutsets  of  the  system.  In 
the  case  of  the  (s ,  t) -connectedness  problem,  the  pathsets  are 
simply  the  (s.t)-paths,  and  the  cutsets  are  the  (s,t)«cuts,  i.e. 
minimal  sets  of  arcs  whose  removal  disconnects  s  and  t.  We 
denote  the  set  of  all  (s,t)-cuts  by  ?(s,t).  In  the  case  of  the 
(s ,  K) -connectedness  problem,  the  pathsets  are  (s.K)-trees  (also 
called  K-trees),  i.e.  minimal  sets  of  arcs  that  Include  paths 
from  s  to  all  members  of  K.  The  cutsets  are  (s,K)*cuts,  i.e. 
minimal  sets  of  arcs  that  disconnect  s  from  some  node  in  K.  We 
denote  by  £(s,K)  the  set  of  all  (s,K)-cuts. 

Many  approaches  to  reliability  analysis  problems  involve 
either  the  explicit  or  implicit  enumeration  of  all  cutsets  or 
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pathsets.  Surprisingly,  even  if  one  is  willing  to  invest  the 
effort  required  to  enumerate  all  pathsets  or  cutsets,  the  problem 
of  computing  system  reliability  may  still  be  difficult, 
specifically  NP-hard  (see  Ball  1986  for  a  discussion  of  these  and 
other  complexity  issues).  The  algorithm  we  give  in  (Provan  and 
Ball  1984)  has  a  running  time  bounded  by  a  polynomial  in  the 
network  size  and  the  number  of  (s,t)-cuts.  Thus,  for  the  2- 
terminal  problem,  if  one  is  willing  to  expend  the  effort  to 
enumerate  cutsets  then  one  can  efficiently  compute  system 
reliability.  On  the  other  hand,  we  show  that  such  an  algorithm 
can  exist  for  the  k-terminal  problem  only  if  P-NP.  These  results 
and  related  results  presented  in  that  paper  all  assume  that  the 
network  itself  is  given  as  input.  In  (Ball  and  Provan  1986),  we 
address  a  related  problem  where  the  input  is  given  as  an  explicit 
list  of  all  pathsets  or  cutsets. 

In  this  paper,  we  extend  our  2-terminal  algorithm  to  the  k- 
terminal  problem.  The  time  bound  obtained  is  not  a  polynomial  in 
the  number  cutsets,  (s,K)-cuts  in  this  case,  but  rather  a 
polynomial  in  the  number  of  (s  ,K) -quasicuts ,  which  are  arc  sets 
closely  related  to  (s,K)-cuts.  We  now  give  definitions  required 
to  present  the  algorithm  and  its  time  bound. 

i, _ THE  ALGORITHM:  The  algorithm  itself  is  a  particular 

application  of  the  2-terminal  algorithm.  The  2-terminal 
algorithm  can  be  described  in  terms  of  a  recursive  update 
formula.  After  giving  some  necessary  definitions  we  state  the 
update  formula  and  then  show  how  it  can  be  applied  to  the  k- 
terminal  problem  and  derive  its  time  bound. 

Given  a  network  G-(N,A)  and  a  subset  of  nodes  S,  define  the 
subnetwork  G(S)  Induced  bv  S  to  be  the  node  set  S  together  with 
all  arcs  between  nodes  of  S.  For  node  sets  U  and  V,  define 
A(U,V)-{ (u, v)€A:u€U,  veV).  For  any  (s.t)-cut  C,  we  identify  the 
two  sets : 

SN(C)  -  (u€N:there  exists  a  path  from  s  to  u  containing  no 
arcs  of  C}  , 

TN(C)  -  (veN: there  exists  a  path  from  v  to  t  containing  no 
arcs  of  C) 

and  note  that  (i)  SN(C)  and  TN(C)  are  disjoint  (although  they  do 
not  necessarily  comprise  all  nodes  of  G)  ,  and  (ii) 
C-A(SN(C)  ,TN(C) )  .  The  set  of  exit  nodes  associated  with  C  is 
defined  to  be  SE(C)-{ueSN(C) :  there  exists  an  arc  (u,v)  with 
veTN(C) )  . 

We  now  define  seme  random  events  necessary  for  the  statement 
of  the  update  formula.  For  any  CCA,  let  E(C)  be  the  event  that 
all  arcs  in  C  fail.  For  any  CeQ(s,t)  define 

EC ( C )  -  EP(s,SF(C))nE(C) 

-  (there  is  an  operating  path  from  s  to  all  nodes  in 
SE(C) ,  but  to  no  node  of  TN(C)J. 
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The  following  two  results,  which  are  proved  in  (Provan  and 
Ball  1984)  form  the  basis  of  the  2-terminal  algorithm. 

Theorem  1: 

Pr  (  EP(  s  ,  t )  J  -  1  -  Zcecc.t)  Pr  [  EC(C)  ]  .  (1) 

Theorem  2:  For  any  Ce£(s,t) 

Pr [ EC ( C )  ]  - 

ll.€C  Pe  U  -  l  Pr  [  EC  (C '  )  ]/n.€C  .nC  pe  ).  (2) 

C'eg(s.t) 
with  SN ( C ' )cSN (C) 

where  n#eC.nC  pe  is  defined  to  be  1  if  C'nC-^. 

Expression  (1)  is  a  formula  for  computing  Pr[EP(s,t)]  in 
which  the  number  of  terms  equals  the  number  of  (s,t)-cuts. 
Expression  (2)  is  an  update  formula  for  recursively  computing 
each  of  the  terms  in  expression  (1).  Note  that  in  order  to 
compute  Pr[EC(C>]  using  (2)  a  large  number  of  "previous"  (s,t)- 
cuts  C,  must  be  considered.  As  a  result  expressions  (1)  and  (2) 
lead  to  an  algorithm  whose  complexity  is  proportional  to  the 
number  of  (s.t)-cuts  squared.  The  specific  complexity  given  in 
(Provan  and  Ball  1984)  is  0((n+m)/42)  where  I  C  (  s  ,  t )  |  . 

Let  us  now  consider  applying  the  2-terminal  algorithm  to  the 
k-termihal  problem.  Given  the  input  network  G-(N,A)  together 
with  the  source  node  s,  terminal  set  K  and  probability  vector 
(pe),  we  modify  the  network  using  a  "super  sink"  node  t  as 
follows.  Construct  network  G'-(N'.A')  from  G  by  setting  N'-NU(t) 
and  A'-AUC0,  where  C0 - ( ( x , t ) : xeK } .  For  each  eeC0  we  set  p,  to  0. 
Figure  1  illustrates  this  transformation. 

Now  consider  the  effect  of  applying  the  2-termlnal  algorithm 
to  G'.  Since  the  algorithm  computes  Pr[EC'(C>]  for  all  (s,t)- 
cuts  C,  it  will,  in  particular,  compute  Pr[EC’(C0)],  where  ’ 
indicates  an  event  or  set  associated  with  G*.  But  we  have 

Pr [ EC  * (C0 )  ]  -  Pr[EP(s,SE'(C0))]*Pr[E'(C0)] 

-  Pr ( E? ( s , K) ] *1 . 

The  last  term  is,  of  course,  the  k-terminal  reliability  measure 
computed  over  G.  Although  Figure  1  Illustrates  the 

transformation  for  directed  networks,  it  also  applies  to 
undirected  networks. 

The  number  of  terms  in  expression  (1)  will  be  the  number  of 
(s.t)-cuts  in  G'.  In  order  to  Interpret  this  quantify  relative 
to  other  k-terminal  algorithms  we  need  to  characterize  this 
quantity  in  terms  of  G  and  the  k-terminal  problem.  Given  G,  s 
and  K,  we  define  an  ( s .K) -ouaslcut  to  be  a  set  of  arcs  that  is  an 
(s,K*)-cut  for  some  K  CK .  We  denote  by  £q(s,K)  the  set  of  all 
( s , K) -quasicuts .  Note  that  5q ( s , K)2£(s , K) .  Of  course,  all 
( s , K) -quasicuts  disconnect  s  from  some  member  of  K,  however,  not 
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FIGURE 


lx  Transformation  for  solving  k-terminal  problem  using 
cut  based  E-terminal  algorithm. 
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all  quasicuts  are  minimal  with  respect  to  this  property.  For  any 
( s , K) - quas icut  C,  we  define 

J(C)  -  KnSN(C) 

K(C)  -  K  -  SN  ( C )  <*  4>)  ■ 

We  now  show  that  the  work  performed  by  the  2-terminal  algorithm 
when  applied  to  the  k-terminal  problem  is  proportional  to  the 
number  of  quasicuts. 

Theorem  3:  Given  G,  G',  K  and  s  as  defined  above,  the  number  of 
terms  in  expression  (1)  when  the  2-terminal  algorithm  is  applied 
to  G'  equals  the  number  of  (s ,K) -quasicuts  in  G'  plus  one. 
proof :  We  again  use  the  '  notation  to  denote  quantities  in  G*. 

We  define  an  isomorphism  $:  £'(s,t)-{C0)  -*  Qq(s,K)  by  setting 

#(C)-COA. 

(4  into):  Let  OC0  be  an  (s,t)-cut  in  G*.  Then 

C-A' (SN' (C) ,TN* (C)  )  and  TN'(C)3{t).  Then  it  must  be  that 
TN '  (C)nK*^  ,  so  that  CnA  is  in  ( s , TN ' ( C )OK) - cut ,  and  hence  an 
( s , K) -  quasicut . 

($  one  to  one):  Let  ,  C2  eQ  '  (  s  ,  t )  -  (  C0  }  ,  and  suppose 

C1nA-C2nA-C.  Then  there  are  paths  comprised  of  arcs  of  A-C  which 
go  from  s  to  every  node  of  J(C),  and  no  such  paths  going  to  any 
node  of  K(C).  Thus,  the  set  D-{ (x , t) : xeJ (C) )  must  be  contained 
in  both  C1  and  C2 ,  and  no  arc  of  C0 -D  can  be  contained  in  either 
or  C2 .  It  follows  that  Cj-A»C2-A-D,  so  that  C^Cj. 

($  onto):  Let  Ce5q(s,K)  and  set  C ' -Cu( (x , t ) : xeJ (C) ) .  Then 

clearly,  C'  is  an  (s.t)-cut  in  G',  and  since  K(C)*«4,  C’*C0. 

■ 

It  immediately  follows  that, 

Corollary  4:  Given  appropriate  G,  s,  K  and  (pe),  the  k-terminal 
network  reliability  problem  can  be  solved  in  0((n+m)/iq^)  time 
where  - | Cq ( s , K )  |  ,  by  applying  the  cut  based  2-terminal 
algorithm  to  G*  as  described  above. 

We  should  note  that  the  construction  illustrated  in  Figure  1 
does  not  provide  a  general  way  of  transforming  a  k-terminal 
problem  into  a  2-terminal  problem.  It  is  useful  in  this  case 
only  because  of  the  particular  Intermediate  calculations  made  by 
our  cut  based  algorithm. 

This  approach  to  the  k-termlnal  problem  provides  a  common 
generalization  of  our  cut  based  algorithm  for  the  2-termlnal 
problem  and  Buzacott's  algorithm  for  the  all-terminal  problem 
(see  Buzacott  1980,  1983).  That  is,  when  the  approach  described 
in  this  paper  is  applied  to  the  2-terminal  problem  its  steps  are 
essentially  equivalent  to  the  steps  of  our  cut  based  2-termlnal 
algorithm  and  when  it  is  applied  to  the  all  -  terminal  problem  on  a 
complete  network  its  steps  are  essentially  equivalent  to  the 
steps  of  Buzacott’s  algorithm.  In  this  case,  the  running  times 
of  both  algorithms  are  polynomial  in  the  number  of  cutsets  since 
for  complete  networks  (s  ,K) -quasicuts  are  (s.K)-cuts.  We  should 
note  that  for  non-complete  networks  our  algorithm  will  require 
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substantially  less  computational  effort  than  Buzacott's. 

There  are  two  interesting  areas  for  further  research 
suggested  by  these  results.  First  of  all,  it  would  be  quite 
useful  to  develop  a  cut  based  algorithm  that  was  linear  in  the 
number  of  cuts  rather  than  quadratic.  Such  an  algorithm  would  be 
particularly  significant  when  one  considers  the  exponential 
growth  rate  of  the  number  of  (s.t)-cuts  in  a  network.  A  second 
line  of  research  involves  the  union  of  products  problem  defined 
in  (Ball  and  Proven  1986).  The  input  to  this  problem  is  either  a 
list  of  pathsets  or  cutsets.  Ue  were  able  to  adapt  an  all¬ 
terminal  network  reliability  algorithm  whose  running  time  is 
proportional  to  the  number  of  pathsets  (spanning  trees  in  this 
case)  to  obtain  an  efficient  general  approximate  algorithm  and  an 
efficient  special  case  exact  algorithm  for  this  problem.  It 
would  seem  that  the  cut  based  algorithm  could  also  be  adapted  to 
the  union  of  products  problem  context. 
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Abstract.  We  consider  relaxations  of  the  greedy  heuristic  for  computing 
minimum-weight  perfect  matching  of  complete  graphs  with  edge  weights  satisfying 
the  triangle  inequality.  We  analyze  their  worst-case  error  bounds  and  time 
complexities,  and  report  on  computational  results  for  a  set  of  randomly-generated 
points  in  the  Euclidean  plane. 

Keywords :  Approximate  algorithms,  complexity,  heuristics,  weighted  perfect 
matching. 

1.  Introduction. 

A  perfect  matching  of  an  edge-weighted  complete  graph  Kn,  n  even,  is  a  subset  M  of  n  ' 2  of 
its  edges  such  that  no  two  edges  in  M  are  incident  upon  the  same  vertex.  Its  weight  w(M) 
is  the  total  weight  of  its  edges.  The  minimum  weight  perfect  matching  problem  is  to  find 
a  matching  A f*  of  minimum  weight.  The  problem  may  be  solved  exactly  in  0(n3)  time  by 
Edmonds’  algorithm  13,  4]  as  suggested  by  Gabow  and  Lawler  [5,  10]. 

When  the  edge  weights  satisfy  the  triangle  inequality,  the  well-known  (ordinary)  greedy 
heuristic  may  be  used  to  obtain  an  approximate  perfect  matching.  This  heuristic  repeatedly 
selects  an  edge  of  least  weight  into  the  matching,  and  removes  from  the  graph  its  two 
vertices  and  all  edges  incident  upon  them.  Reingold  and  Tarjan  11'  have  shown  that  the 
weight  of  the  so-obtained  approximate  solution  is  at  most  (j)  n*0*  ^3/2^  -  1  times  that  of  the 
optimal  matching. 

The  edge  selection  criterion  at  each  stage  ;=  l,...,r»/2  may  also  be  viewed  as  identifying  a 
set  5^  of  edges  from  each  vertex  to  its  least-weight  neighbor  and  then  selecting  an  edge  of 
minimum  weight  in  S;.  In  this  context,  the  question  naturally  arises  whether  examining 
fewer  vertices  and  weakening  the  minimization  requirement  may  lead  to  greedy  heuristics  of 

•  Thia  metre b  wu  lupported  in  part  by  the  National  Science  Foundation  under  Grant  No.  MCS-81 13503. 
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reasonable  performance.  In  this  paper,  we  discuss  the  behavior  of  an  exponential  number 
of  such  greedy  heuristics  proposed  in  [7],  having  time  complexities  ranging  from  0(n2)  to 
0(n2log  n),  and  with  worst-case  error  bounds  ranging  from  2"/2-l  to  n/2.  The  latter  is  a 
bound  for  a  subclass  of  these  heuristics  to  which  the  ordinary  greedy  belongs. 

In  the  next  section,  we  define  the  class  of  weak  greedy  heuristics  which  select  one  edge  at 
each  stage,  and  we  derive  worst-case  error  bounds.  In  section  3,  we  examine  a  number  of 
special  cases  of  interest.  In  section  4,  we  discuss  the  time  complexity  of  these  heuristics. 
In  the  last  section,  we  present  computational  results  for  randomly  generated  points  in  the 
Euclidean  plane.  Surprisingly,  these  experiments  indicate  that  there  is  very  little  variation 
in  the  average  performance  of  these  heuristics,  regardless  of  the  amount  of  work  spent  in 

selecting  the  matching  edge  at  each  stage.  We  conclude  with  a  brief  discussion  of  a  much 

stronger  class  of  matching  heuristics  which  select  several  edges  at  each  stage  [8]. 

2.  The  class  of  weak  greedy  heuristics 

As  a  measure  of  performance,  we  shall  use  the  ratio  /|n)  of  the  weight  of  the  approximate 
perfect  matching  M ,  produced  by  any  heuristic,  to  that  of  the  optimal  matching  M *,  i.e. 
f[n)  -  .  We  define  /(n)  =  1  if  w(M)  -  =  0.  We  shall  refer  to  any  heuristic 

algorithm  that  sequentially  selects  into  the  matching  A/,  edges  e.,  j-  1 n/2,  one  at  a 

time,  as  weak  greedy.  If,  at  any  of  its  stages,  a  weak  greedy  selects  an  edge  e^=(u,  e) 

which  is  not  a  least-weight  edge  among  those  incident  on  u  or  v,  then  /(n)  may  become 
infinite.  For  instance,  consider  the  case  where  the  n  vertices  consist  of  n/2  distinct  points 
in  the  plane,  and  their  duplicates.  Thus,  the  weight  of  each  selected  edge  must  be  a 
finite  multiple  of  M*,  where  M*  is  an  optimal  matching  of  the  vertices  unmatched  by  M 
immediately  prior  to  the  selection  of  We  denote  by  G REEDY(  a,, ....  an,2)  those  weak 
greedy  heuristics  for  which  rn(e-)  <  u}  w{M*) ,  where  0  <  o}  .  <  oc,  ;' =  1, ...,  n/2  . 
Furthermore,  we  let  M '}  be  the  set  of  matching  edges  { c} ...,  «n/2  }  that  would  be  selected 
by  such  a  heuristic  at  stages  j  through  n/2,  and  let  the  corresponding  ratios  be 
/(n-2y-f2)  =  w{M})  /  w(M*),  where  M*  is  as  defined  earlier. 

Theorem  1:  For  any  given  GREEDY(a lt ...,  a^2), 

/(n—  2j-t  2)  (1-t  Oy)  /(n  2/),  /-  1, ....  n/2- 1  ,  and  /( 2)  <  ob/2. 


910 


Proof:  For  /=  1, n/2-  1  we  have  f(n-2j+2)  =  w(M})  /  u>(M*)  =  (u>(e^.)  +  u>(Af^,))  /  w(Mf)  < 
o}  +  u/(A/^+1)  /  u>(AC).  It  suffices  to  show  that  ^)  /  w(M^)  <  ( 1  -f  a,)  /(n-2j),  or 

equivalently  that 

»(*$.,)  <  (l  +  o.)^).  (1) 

Let  e^=(u,v)  be  the  edge  selected  at  the  j-th  stage  of  the  heuristic.  If  an  optimal  perfect 
matching  also  contains  e^,  then  (1)  holds  trivially,  since  in  this  case 

w(e})  +  u>(AC+1  }  =  u>(M*).  Otherwise,  the  optimal  matching  must  contain  edges  «'  =  («,«') 
and  e"  =  (v,v').  Let  t  be  the  edge  (u  Then,  M'  =  M*  u  {e}  \  (e',e"}  forms  a  perfect 

matching  of  the  vertices  matched  by  Af*+1 .  Clearly,  w(M')  <  u>(M*)  +  w(e)  -  w(e ')  -  w(e 
By  triangle  inequality  we  have  w(e)  <  w(e')  +  w(e")-t-  w(c}) ,  and  hence  (1)  holds  for  all 
3  =  2, ....  n/2  .  Clearly,  /(2)  <  on/r  ■ 

In  order  to  derive  a  bound  on  the  /(n)  of  GREEDY(  Oj, oBy2)  heuristics,  we  require  the 
following  lemma: 

Lemma  X:  Suppose  that  j(n-2j+2)  =  fl.+  |l+^|(n-2;)  for  j=  2, ....  n/2-1  and  y(2)  =  an^. 
Then,  g(n)  =  ( 1  +  )  ~  >• 

Proof:  Let  g*(n-2j-t2)=g[n-2j^2)-t  1  for  j- 1,..., n/2.  Then, 

f*(n-2;+2)=(l-f  Q;)j+(rj  2 j)  for  j-2 . n/2-1,  with  f*(2)  =  ■  (2) 

Solving  (2)  recursively,  we  obtain  ?*(n)  =  J"l  %\  (>+«>»•  ■ 

Theorem  2:  For  any  GREEDY(ay  ...,an^),  the  ratio  /(n)  <  (  1  4  a})  -  1 . 

Proof:  Considering  the  definitions  of  /(n)  in  Theorem  1  and  p(n)  in  Lemma  1,  it  suffices 
to  show  that  j(n)  >  /(n).  This  is  certainly  the  case  for  n- 2.  Suppose  g(n- 2)  >  f(n- 2). 
Then,  g(n)  =  a,  +  (1  +  a})  ?(n-2)  >  a,  +  (1  +  a,)  /(n-2)  >  /(n).  ■ 

S.  Special  cases 

Let  k.  be  an  integer  satisfying  \  <  k.<  n-2;+2  for  each  j-  1,,...,  n/2.  At  each  stage  j ,  such 
a  heuristic  arbitrarily  selects  k}  distinct  vertices,  determines  the  multiset  S}  of  edges  which 
connect  these  k}  vertices  to  their  least*weight  neighbors,  and  selects  an  edge,  say,  e} e  S}, 
whose  weight  w(e})  does  not  exceed  the  average  of  all  edges  in  S;..  Then,  it  places  e}  into 
the  matching,  and  removes  from  the  graph  its  two  vertices,  along  with  all  edges  incident 
upon  them. 
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Lemma  2:  For  each  j-l, ...,  n/2,  the  parameter  <  1  if  JL=1,  and  <  2  / k.  otherwise. 

Proof:  Clearly,  for  k.=  \  we  have  w(e^)  <  w(M*).  Now,  suppose  k}  >  1 ,  and  let  (u,  v)  be 
an  edge  in  AC.  If  u'  and  «>'  are  least-weight  neighbors  of  u  and  t>,  respectively,  then 
u’(u,  u ')  -f  w(v, « ')  <  2ie(u,  e).  At  each  stage  /,  we  must  then  have  £  5«e(e)  <  2w(M*), 
and  due  to  the  particular  selection  of  edge  e^  ,  we  have  <  2w(AC).  ■ 

For  notational  convenience,  we  denote  the  above  heuristics  by  GREEDY(kv...,kn^). 
Clearly,  there  are  exponentially  many  such  heuristics.  We  will  consider  some  special  cases. 
At  one  extreme  is  GREEDY(l, ...,  1),  which  is  the  “weakest”.  At  the  other  extreme  is 
GREEDY{n,  n-2, ...,  2),  which  forms  the  “strongest”  subclass  of  GREEDY(kv kn^2). 
GREEDY(n ,  n-2, ...,  2),  with  the  additional  stipulation  that,  at  each  stage  j,  an  edge  e.  e  S  ■ 
of  minimum  weight  is  selected,  is  precisely  the  ordinary  greedy  described  in  section  1.  In 
between,  there  are  several  interesting  heuristics.  For  instance,  one  may  examine  no  more 
than  a  fixed  number  of  vertices  at  each  stage:  For  a  given  constant  3  <  k  <  n  let  k k  for 
all  j  such  that  k  <  n-2j+2,  and  k.=n-2j-^2  otherwise.  This  results  in 

G REEDY{k,  k,2  k/ 2] —2, ....  2).  Alternately,  one  may  examine  the  least-weight  neighbors  of 
a  fixed  portion  of  the  vertices  at  each  stage,  i.e.  k (n-2;-!-2)/p]  for  j  =  1,...,  n/2,  where  p 
is  a  positive  integer. 

Theorem  3:  The  ratios  /(n)  for  GREEDY(  1 . 1),  GREEDY(k . *,  2f*/2]-2 . 2), 

GREEDY[n,  n-2, ...,  2),  and  6’ WEE/JKl  n/p],  [(n-2)/p [2/p]),  are  bounded  above  by 
2"  2  -  1,  |*/2l(l+2/*)"  2  r*/2i,l-l,  n/2,  and  2r(p!)2(r* p)  /(2p)!  -  I,  respectively. 

Proof:  For  each  case  we  compute  the  bounds  on  the  a^’s  from  Lemma  1,  and  apply 
Theorem  2:  For  the  first  one  of  these  heuristics,  we  have  a}  <  1  for  all  j.  For  the 
second,  we  have  <  2/k  for  j  =  1, ...,  n/2  -  f*/2]+l ,  and  a}  <  2/(n-2j-f2)  for 

f—  n/2- fit/2  +1,...,  n/2.  For  the  third,  the  bound  is  directly  obtained  from  the  previous 

case  by  selecting  k=n.  For  the  last  one,  a}<  2p/(n-2;+2)  for  j =  1, ...,  (n-2p+2)/2  -  1 ,  and 
<  1  for  j  =  (n-2p+2)/2, ....  n/2.  I 

Although  the  error  bounds  for  the  first  three  of  these  heuristics  are  tight  [7),  more 
restrictive  edge-selection  criteria  may  result  in  improved  error  bounds,  e.g.  the  ordinary- 
greedy  implementation  of  GREEDY(n,n- 2,...,  2)  results  in  an  /(n)  <  (|)  n1®*  l3/2)  -  1  (see 


It  is  worthwhile  to  note  that  distinct  members  of  GREEDY(kv ...,  *B^2)  may  have  identical 
worst-case  error  bounds  that  may  be  tight  [7],  Such  is  the  case  for  all  G REEDY(kv 
with  k}  =  1  or  2. 

4.  Time  complexity 

The  time  complexity  of  all  GREEDY(kJt ...,  kn^)  heuristics  is  bounded  above  by  that  of  the 
ordinary  greedy  which  is  0(n2  log  n)  for  general  weights  and  0(n1,5  log  n)  for  Euclidean 
problems  in  the  plane  (see  Bentley  and  Saxe  [1]).  However,  an  improvement  is  possible  for 
some  special  cases.  For  instance,  G REEDY(l, ...,  1)  can  be  implemented  in  0(n3)  time  for 
general  weights  satisfying  the  triangle  inequality.  Furthermore,  if  the  vertices  are  selected 
randomly  at  each  stage,  the  average  time  complexity  for  the  Euclidean  case  is  0(n  log  n): 
Initially  construct  the  Delaunay  triangulation  of  the  given  vertices  in  0(n  log  n)  time  (see 
e.g.,  Guibas  and  Stolfi  [9]),  and  update  this  data  structure  after  each  stage.  The  latter 
requires  0(d  log  d)  time  per  edge  deletion,  where  d  is  the  sum  of  the  degrees  of  its  two 
vertices.  The  average  degree  of  a  vertex  in  the  triangulation  is  6.  Also,  the  average 
degree  of  a  vertex  that  is  a  nearest  neighbor  of  another  vertex  is  at  most  36,  since  no 

vertex  can  be  the  nearest  neighbor  of  more  than  6  vertices.  Hence,  the  average  of  the  sum 

of  the  degrees  of  a  vertex  and  of  its  nearest  neighbor  is  at  most  42,  and  the  average  time 

to  implement  the  deletion  of  an  edge  in  the  triangulation  is  0(1).  Similarly, 

GREEDY(k,  ...,*,  2[i/2)-2, ....  2)  can  be  implemented  to  run  in  0(kn 2)  time  for  general 
weights.  The  previous  argument  may  be  generalized  to  show  that,  on  the  average,  the  sum 
of  the  degrees  of  the  vertices,  upon  which  the  edges  in  5.  (defined  in  section  3)  are 
incident,  is  0(t).  Thus,  the  average  time  complexity  of  this  heuristic  for  Euclidean 
problems  in  the  plane  is  0(max{nlog  n,  nt  log  *}). 

5.  Discussion 

In  spite  of  the  large  spread  in  the  theoretical  error  bounds  of  these  greedy  heuristics,  our 
preliminary  computational  results  with  vertices  uniformly  distributed  in  the  plane  indicate 
that  their  average  performance  is  surprisingly  similar.  For  each  n  =  50, 100, 200, 300,  we 
generated  10  problems  and  solved  each  one  by  the  two  “extreme’’  greedy  heuristics,  i.e. 
GREEDY( l, ...,  1)  and  the  ordinary  greedy.  The  ratio  of  the  average  weight  of  the  solutions 
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obtained  by  the  former  to  that  obtained  by  the  latter  was  found  to  be  within  1 1 .08, 1.15]. 
We  also  solved  these  problems  optimally  using  an  implementation  of  Edmonds'  algorithm 
given  in  Burkard  and  Derigs  [2i.  The  ratio  of  solutions  obtained  by  G REEDY(\, 1),  and 
by  the  ordinary  greedy,  to  the  optimal  solution  was  found  to  be  within  |1.29, 1  45]  and 
1.17,1.29,  respectively.  Based  upon  these  preliminary  results,  we  conclude  that  the 
performance  exhibited  by  GREEDY(  1,...,  I)  could  be  satisfactory  for  large  practical  instances 
of  the  class  of  problems  we  have  tested. 

The  worst-case  error  of  the  heuristics  discussed  in  this  paper  can  be  considerably  reduced 
by  selecting  more  than  one  edge  into  the  matching  at  each  stage.  Grigoriadis  and 
Kalantari  [8],  have  proposed  such  a  class  of  heuristics,  called  uONE-THIRD(k)*,  where 
0  <  k  <  [log  3nj  is  the  number  of  heuristic  stages.  If  k  =  0,  the  heuristic  is  bypassed  and 
the  problem  is  solved  by  an  exact  algorithm.  Otherwise,  it  selects  a  matching  Sj  of  Kn, 
with  |5j|=[j nj  such  that  u'{5 j)  does  not  exceed  a  certain  portion  of  the  optimal  weight. 
Then,  it  discards  all  vertices  matched  by  5,,  as  well  as,  their  incident  edges.  This  results 
in  a  reduced  complete  graph.  The  heuristic  repeats  this  process  k- 1  times,  and  at  its 
(*+l)-st  stage,  it  obtains  an  optimal  perfect  matching  of  the  remaining  complete  graph. 

At  each  stage  j,  the  set  of  matching  edges  S}  is  obtained  by  first  extracting  a  subgraph 
whose  edges  are  determined  by  computing  the  least-weight  neighbors  of  its  vertices.  Then, 
the  edges  of  each  connected  component  of  this  graph  are  duplicated  to  give  an  Eulerian 
multigraph,  and  an  Euler  tour  is  constructed  which  is  subsequently  reduced  to  a 
Hamiltonian  tour.  From  the  union  H}  of  these  Hamiltonians,  a  subset  of  size 
|S  .|  =  [  3  ]  H} ;  |J  is  selected  with  tc(S;)  <  3  «»(//.) . 

This  class  of  heuristics  produces  an  approximate  solution  with  weight  of  at  most  2  (3)*—  1 
times  the  optimal  weight,  in  0(max  {n2,  (n/3*)3  })  time.  For  Euclidean  problems,  this  time 
can  be  reduced  to  0(max  {nlog  n,  (n/3*)3  }).  These  are  substantial  improvements  over  the 
class  of  weak  greedy  heuristics  which  select  a  single  edge  at  each  stage.  In  particular, 
when  *  =  [jlog3(n2/log3n)J,  we  have  /(n)  <  2  (  n2  /  log3  n  )°-l,  where  o  =  jlog3  (|)  =  0.2571. 
The  time  complexity  is  0(n2)  for  general  weights  and  0(n  log  n)  for  problems  in  the 
Euclidean  plane.  This  case  is  “optimal”  with  respect  to  time,  since  in  the  worst  case,  no 
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heuristic  with  lower  time  complexity  can  produce  a  finite-valued  /(n)  [6  .  Whether  these 
theoretical  improvements  over  existing  matching  heuristics  may  also  offer  superior 
performance  in  practice  is  the  subject  of  further  computational  experimentation. 
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SENSITIVITY  ANALYSIS 


FOR 

STATIONARY  PROBABILITIES  OF  HARKOV  CHAINS 


Peter  W.  Glynn1 
University  of  Wisconsin 


ABSTRACT 

This  paper  considers  the  problem  of  evaluating  the  sensitivity  of  a 
steady-state  cost  a(0)  to  underlying  uncertainty  in  a  parameter  vector  0 
governing  the  probabilistic  dynamics  of  the  system  under  consideration.  We 
show  that  the  gradient  Va(0)  plays  a  fundamental  role  in  the  parametric 
statistical  theory  for  Harkov  processes.  We  then  survey  numerical  methods 
available  for  evaluating  Va(0)  and  introduce  a  new  Monte  Carlo  estimator 
for  Va(0),  which  is  applicable  to  Markov  processes  of  substantial 
generality. 
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INTRODUCTION 


Let  X  *  {X^:  n  >_  0}  be  an  irreducible  positive  recurrent  Markov 
chain  governed  by  a  transition  kernel  P(0),  where  6  is  a  parameter 
vector  taking  values  in  If  x(0)  is  the  stationary  measure  of 

P(0)  and  f(0,x)  is  the  cost  of  running  the  chain  while  in  state  x,  then 
a(0)  =  / f(9,x)  n(0,dx)  is  the  long  -run  average  cost  of  running  X  under 
parameter  choice  0.  In  many  applications  settings,  it  is  of  Interest  to 
compute  the  sensitivity  of  a  to  (infinitesimal)  changes  in  the  parameter 
0.  Specifically,  it  is  frequently  useful  to  be  able  to  evaluate  Va(9), 
the  gradient  of  a(«)  evaluated  at  0  e  Since  it  is  generally 

impossible  to  analytically  evaluate  Va(0)  (except  for  simple  models),  this 
paper  will  concentrate  on  numerical  methods  for  determining  7a(0). 

This  paper  is  organized  as  follows.  In  Section  2,  we  Introduce  an 
important  statistical  application  for  these  methods.  Ue  show  that  the 
numerical  methods  discussed  here  offer  the  opportunity  to  do  statistical 
point,  variance,  and  interval  estimation  for  highly  complex  functionals  of 
analytically  Intractable  Markov  processes.  Section  3  is  devoted  to  the 
formal  derivation  of  an  expression  for  7a(0)  and  describes,  for  finite 
state  Markov  chains,  a  set  of  linear  equations  which  characterizes  7a(0). 
For  complicated  stochastic  processes,  the  corresponding  linear  systems  are 
too  complex  to  solve  via  standard  numerical  methods,  and  Monte  Carlo 
techniques  therefore  become  relevant.  Thus,  Section  4  provides  a  (new) 
Monte  Carlo  estimator  for  7a(0),  which  is  applicable  to  Markov  chains  of 
substantial  generality.  Finally,  Section  S  offers  a  brief  summary  of  the 
paper. 
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2.  STATISTICAL  RELEVANCE  OF  THE  GRADIENT 

Suppose  Chat  the  transition  kernel  P  governing  the  M.-irkov  'i.iln  / 

is  determined  by  a  finite  family  of  distributions  (Fj,  . F  ) 

(F,(y.) . F  (y  )),  where  each  F,(y.)  is  a  probability  distribution 

ii  mm  11 

associated  with  a  known  parametric  family  in  which  y^  e  t***.  IF 

0  ■  (y.  »  •••,  Y  )»  then  P  can  be  viewed  as  a  function  of  0,  na  «iv 
i  m 

P  -  P(0). 

In  statistical  contexts,  the  vector  0  t  l1*  (d  »  d.  +  '••  +  d  ) 
is,  In  general,  unknown.  Most  of  the  literature  on  statistical  inference 
for  Marko*.  processes  has  concentrated  on  estimation  of  the  “true"  parameter 
6*  (i.e.,  estimation  of  d  when  the  observed  chain  X  is  governed  by 
P(0*))  and  on  related  Issues  such  as  production  of  variance  estimates  and 
confidence  intervals.  However,  In  many  applications  settings,  it  is  of 
more  practical  Importance  to  estimate  not  0*  but  some  associated  steady** 
state  cost  a(0*). 


(2.1)  EXAMPLE.  Let  X  -  {X  :  n  >  )}  be  the  Markov  chain  consisting  of 

n  — 

waiting  times  of  consecutive  customers  In  the  M/M/1/*  queue.  (See  HEYMAN 
and  SOBEL  (1982)  for  a  description.)  Arrivals  follow  an  exp(y^)  distri¬ 
bution,  whereas  service  times  are  distributed  exp^)*  Suppose  that  the 
long-run  customer  waiting  time  a(0)  is  of  importance,  when  0  - 
The  objective  is  to  produce  estimates  for  a(0*),  as  well  as  variance  and 

interval  estimates,  from  observed  inter-arrival  times  Y..,  ...,  Y  as 

'  11  ni 

well  as  observed  service  times  Y. . ..,  Y_  .  Note  that  in  certain  set- 

21  2n£ 

tings,  the  inter-arrival  times  and  service  times  may  have  been  collected 


919 


from  two  independent  sources,  so  that  no  waiting  tines  for  the  system  are 
available.  Por  example,  the  queue  might  correspond  to  a  telephone  switch¬ 
ing  system  being  designed,  in  which  historical  inter-arrival  data  exists 
and  service  time  data  for  the  proposed  switching  device  is  available. 

(2.2)  EXAMPLE.  Virtually  any  general  discrete-event  stochastic  system  can 

be  formulated  as  a  generalized  semi-Markov  process  (GSMP).  A  GSMP  can, 

in  turn,  be  viewed  as  a  Markov  chain  X  -  {X  :  n  >  0),  where  X  ■  (S  ,C  ) 

n  —  n  n  n 

records  the  "physical  state"  (e.g.,  configuration  of  customers  in  a 
queue)  and  clock  readings  Cn  (e.g.,  remaining  service  times  for  each  of 
the  customers  in  the  system)  at  the  nth  transition  of  the  GSMP.  (For 
further  details,  see  GLYNN  (1983).)  GSMP's  are  characterized  probabilis¬ 
tically  by  certain  distributions  Fp  ...,  governing  the  way  clocks  are 
reset  (e.g.,  service  times  in  a  queue)  and  by  routing  probabilities 
Pp  ...,  (e.g.,  the  proportion  of  customers  who  visit  station  j  after 

receiving  service  at  station  i). 

In  many  applications  environments,  the  distributions  Fp  Fp  ...,  F^ 
and  routing  probabilities  pp  ...,  p^  are  unknown  and  must  be  estimated 
via  statistical  methods.  If  one  models  the  distributions  Fp  ...,  F^  as 
belonging  to  parametric  families  (i.e.,  F^  ■  F^y^)),  then  th"*  transition 
function  P  governing  X  can  be  viewed  as  P  ■  P(0),  where  0  - 
(Yp  ...,  Yp  Pp  ...,  p^).  The  performance  of  a  stochastic  system  is 
often  assessed  by  considering  a  long-run  average  cost  a  for  tbu  system 
which,  in  this  context,  can  be  regarded  as  a  function  a  *  a(0)  of  the 
unknown  parameter  0  associated  with  P.  Consequently,  an  important 
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statistical  objective  involves  point  and  interval  estimation  of  a(6*), 
where  0*  is  the  "true”  parameter  governing  the  system. 

We  will  now  outline  a  method  for  obtaining  point  and  interval 
estimates  for  oc(0*),  which  is  applicable  to  very  general  stochastic 
systems.  Let  0  •  (0j(nj),  ...»  be  an  estimator  for  0*  * 

(0*,  ....  0*)  (n.  is  the  sample  size  associated  with  estimation 

1  d  1 

of  0*.)  Such  estimators  6  are  frequently  available  for  complex  systems. 
In  particular,  one  can  often  appeal  to  maximum  likelihood  estimation  (MLE) 

A 

methods  for  estimating  0*.  Under  very  general  conditions,  0  will  be 
asymptotically  normal,  in  the  sense  that 

*  ® 

(2.3)  0  a  N(0*,  C(nL,  ...,  n.)) 


where  N(0*,  C(n^ . n^))  is  a  multivariate  normal  r.v.  with  mean  0* 

and  covariance  matrix  C(n, ,  ...,  n.).  (  «*  denotes  "has  approximately 

the  distribution  of".)  In  certain  design  settings  (see  Example  2.1),  the 
data  for  each  of  the  different  components  0*  is  gathered  from  indepen¬ 
dent  sources.  In  this  case,  C(n, ,  ...,  n.)  takes  the  diagonal  form 

1  d 


r  2, 

V“l 


(2.4) 


C(tij ,  . . . ,  n^)  ■ 


2. 

°d/nd 


If  a  is  continuously  differentiable  in  a  neighborhood  of  0*,  then  a 
Taylor  expansion  of  a  around  0*  shows  that  (2.3)  yields 


(2.5) 


c(0)  »  N(a(0*),  W*)1  C(n^ . nd>  Va(0*)) 
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where  7a(9*)  Is  Che  (column)  gradient  of  a  evaluated  at  9*.  (This  Is 

the  so-called  "delta  method”  of  statistics.) 

Relation  (2.5)  shows  that  If  ,  . ..,  n^  are  large,  then  a(9)  is 

a  good  point  estimator  for  a(9*).  Let  C(n, ,  ...,  n.)  be  an  estimator 

1  d 

for  C(nj,  ...,  nj)  (such  variance  estimators  are  commonly  available  f£>r 
MLE  point  estimates  6).  Then,  (2.5)  proves  that 

(2.6)  v  =  7a(9)C  c(nj,  ...,  n^)  7a(9) 
is  an  estimator  for  the  variance  of  a(9)  and, 

(2.7)  (a(9)  -  z(6)v^2,  a(9)  +  z(6)v^2} 

Is  an  approximate  100(1-6)%  confidence  Interval  for  a(9*),  where  z(6) 
Is  the  solution  of  P{N(0,1)  z(6)}  -  1  -  6/2.  Thus,  provided  that  a(9) 

_  A 

and  7o(9)  can  be  evaluated  (either  analytically  or  numerically),  (2.6) 
and  (2.7)  provide  a  solution  to  the  variance  and  Interval  estimation 
problems  discussed  above. 

In  the  case  that  the  covariance  matrix  C(n,  ,  ...,  n  )  takes  the 

1  a 

form  (2.4),  v  can  be  expressed  as 

(2.8)  v  -  l  (t~  ot(9))  cr2/n  . 

1-1  °°1  1  1 

Relation  (2.8)  shows  that  the  contribution  of  uncertainty  in  9*  to  the 

A  A  2  A  2 

variance  of  a(0)  Is  given  by  o^/n^m  This  can  be  used  to 

determine  which  component  to  additionally  sample  if  the  current  estimator 
of  a(9*)  Is  too  "noisy." 
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(2.1)  EXAMPLE  (continued).  Because  of  the  simplicity  of  the  M/M/l/<* 
queue,  a  can  be  analytically  determined  in  closed  form,  namely  a(Yj*Y2)  ■ 

Y2^Y2_Y1^  1  for  Y1  <  y2  for  Y1  —  y2^‘  If  Y1  <  Y2*  ^2*8^  reduces 

to 


<vV 


*2, 

4  °l/nl  + 


<VV 


-2, 

4  ®2'n2  » 


a2/a2\ 

where  *8  8  variance  estimate  for  y^y^)  formed  from 

Y1 1  *  *•*’  Yln1(Y21*  **•’  Y2n2^ * 

For  more  complicated  systems,  such  as  that  described  in  Example  2.2, 
a(»)  cannot  be  determined  analytically,  and  so  one  must  turn  to  numerical 
algorithms.  These  algorithms  will  be  described  in  the  remaining  sections 
of  this  paper. 


3.  A  FORMULA  FOR  THE  GRADIENT  OF  THE  STEADY-STATE 

Let  P(0)  be  the  transition  function  for  X  under  parameter  6,  so 

that  P(0,x,A)  is  the  corresponding  conditional  probability  that  Xr+1  c  A, 

given  that  X  ■  x.  For  an  initial  distribution  p(0),  let  P.  be  the 
n  o 

probability  measure  on  the  path-space  of  X  associated  with  P(0),  namely 

VXq  €  Aq . xn  €  An*  "  /  H(0.d*o>  /  PO.^.dXj)  •••  /  P(0*xn-i »dxn)  * 

A0  A1  An 

If  X  is  Harris  recurrent  under  P(0)  (see  REVUZ  (1984)),  then  there 
exists  a  unique  probability  measure  x(0)  such  that 
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(3.1)  7  l  f(6,X.  )  «•  /  f(0,x)  x(9,dx)  P.  a.s. 

k-0  S  W 

as  n  ♦  •  (for  a  large  class  of  f(0)'s).  The  measure  x(0)  is  stationary 
for  P(0),  in  the  sense  that 

(3.2)  n(0,«)  ■  /  P(0,x,»)  x(0,dx)  . 

S 

(S  is  the  state  space  of  X.)  In  fact,  x(0)  is  the  unique  probability 
measure  satisfying  (3.2).  Our  goal  is  to  numerically  compute  o(0)  and 
Va(0),  where  a(9)  is  the  steady-state  limit 

(3.3)  a(0)  -  /  f(0,x)  *(0,dx)  . 

S 

Since  (3.2)  only  determines  it(0)  up  to  a  multiplicative  constant,  it 
is  necessary  to  add  an  additional  constraint  stating  that  the  total  mass 
x(9,S)  equals  1.  The  quantity  a(0)  is  then  the  unique  solution  of  the 
Integral  equation  system 

n(0,*)  -  /  P(0,x,»)  x(0,dx) 

S 

(3.4)  n(0,S)  -  1 

a(0)  -  /  f (0 ,x)  x(0,dx)  . 

S 

The  system  (3.4)  is  well  known  and  has  been  extensively  studied.  If  S  is 
finite,  then  P(0)  is  a  finite  matrix  and  (3.4)  becomes 
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(3.5) 


*(0)*  -  wte)1  p(e) 

it(9)te  -  1 
a(0)  -  w(0)C  f(0) 

(all  vectors  are  column  vectors;  e  Is  the  vector  consisting  of  l's). 

As  we  shall  see,  a  similar  system  describes  the  gradient  Va(0)  of 
a.  Let  us  formally  suppose  that  the  transition  function  P(0)  can  be 
expanded  as 


P(0  +  he1)  -  P(0)  +  hQ1(0)  +  o(h) 

where  e^  is  the  itil  unit  vector  in  I**.  Assume  that  i^O+he^)  is 
formally  differentiable  at  h  ■  0,  so  that  there  exists  a  signed  measure 
r>^( 8  )  such  that 

(3.6)  rt(0  +  he^)  *  n(0)  +  hn^O)  +  o(h)  . 

The  statlonarlty  equation  (3.2)  implies  that  r» ^ ( 8 >  must  satisfy 

(3.7)  Ti  (9,dx)-J  ii  (0,dx)  P(0,x,»)  -  J  Q  (0,x, •)  n(9,dx) 

S  S 

(formally  differentiate  both  sides  of  (3.2)).  (The  equation  (3.7)  is 
Poisson's  equation  for  the  kernel  P(0).)  These  formal  calculations  can  be 
made  rigorous,  even  in  general  state  space;  such  arguments  will  appear 
elsewhere. 

In  finite  state  space,  the  arguments  are  more  straightforward  and  have 
previously  appeared  in  SCHWEITZER  (1968),  GOLUB  and  MEYER  (1986),  and  MEYER 
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and  STEWART  (1986).  We  give  a  very  elementary  proof  In  the  Appendix  to 
this  paper;  our  argument  uses  only  elementary  Markov  chain  theory.  Note 
that  in  finite  state  space,  (3.7)  becomes  ri ^( 9 ) C ( I— P( 0 ) )  ■  x(8)t  Q^(0). 
This  does  not  uniquely  identify  t)^(0),  since  t)^(0)  +  6x(0)  also 
satisfies  the  equation,  for  all  6.  Note  that  since  n(0)Ce  *  1  for  all 
0,  it  follows  that  T^(0)Ce  ■  0  (see  (3.6)).  Let  11(0)  be  the  matrix  in 
which  all  rows  are  identical  to  x(0).  It  is  easily  verified  that  since 
^(0)*®  ■  0,  Tii(0)t  n(0)  ■  0.  Consequently,  r)^(0)  also  satisfies 

(3.8)  hi(0)t(l  -  P(0)  +  n(0))  -  *(0)*  <^(0)  . 

It  is  well  known  (see  KEMENY  and  SNELL  (1960),  p.  100)  that  (I-P(0)+II(0)) 
has  an  Inverse,  called  the  fundamental  matrix,  which  we  shall  denote  F(0). 
Hence,  in  finite  state  space,  the  iCh  component  of  Va(0)  can  be 
computed  as  the  solution  of  the  system 

h.O)*  -  x(0)C  Q.(0 )  F(0) 

(3.9)  a  t 

a(9)  -  *(0)  fj(0)  +  t1l(0)t  f(0) 

where  f ^(0 )  is  the  vector  in  which  the  jth  component  is  df(0,j)/d0^. 

Consequently,  when  S  is  finite,  the  systems  of  linear  equations 
(3.5)  and  (3.9)  may  be  solved  numerically  to  obtain  a(0)  and  Va(0).  If 
S  is  not  finite  (or  if  the  number  of  elements  in  S  is  large),  numerical 
methods  not  dependent  on  explicit  solution  of  linear  equations  must  be 
considered.  In  the  next  section,  we  show  how  Monte  Carlo  methods  can  be 
used  to  advantage  here. 
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4.  MONTE  CARLO  EVALUATION  OF  STEADY-STATE  GRADIENTS 

A  critical  assumption  underlying  the  analysis  of  this  section  is  that 
it  la  possible  to  generate  sample  trajectories  of  X  under  the  measure 
Pg.  For  the  examples  that  we  have  in  mind  (see  particularly  Example  2.2), 
this  assumption  is  clearly  in  force. 

Assuming  now  that  X  has  distribution  Pg,  relation  (3.1)  states 

that 

1  n_1 

(4.1)  ±  l  f(9,Xk)  -*>  a(0)  PQ  a.s. 

k*»0 

as  n  ♦  •.  In  other  words,  rather  than  solving  the  Integral  equation 
system  (3.4),  one  may  numerically  approximate  a(0)  by  the  sample  average 
appearing  on  the  left-hand  side  of  (4.1).  The  simplicity  of  this  numerical 
procedure,  as  well  as  Its  broad  applicability,  Is  the  source  of  the  power 
of  the  Monte  Carlo  method.  Our  objective  here  is  to  obtain  a  similar  Monte 
Carlo  algorithm  for  evaluation  of  the  gradient  7a(0). 

Observe  that  (at  least  formally)  we  have 

(4.2)  a(0)  -  /  f (0  ,x)  x(0,dx)  +  /  f(0,x)  n  (0,dx)  . 

1  S  1  S 

A  Monte  Carlo  estimator  for  the  first  term  appearing  on  the  right-hand  side 
of  (4.2)  Is  given  by  the  sample  mean 

,  n-1  . 

(4.3)  “■  l  r§-  f(0,X.)  . 

n  k-0  aei  ^ 

It  remains  to  obtain  an  estimator  for  the  second  term. 
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As  In  Che  flnice  state  space  context,  one  expects  that  the  signed 
measure  1^(0)  will  satisfy  h1(0,S)  -  0.  As  a  consequence,  it  follows 
from  (3.7)  that  T]^(0)  should  satisfy 

(4.4)  “  /  tk (6 ,dx)  P(0,x,»)  +  /  Tj  (0,dx)  x( 0 ,  • ) 

S  1  S  1 

■  /  Q.O.*.*)  x(0 ,dx)  . 

S 

Letting  n( 0)  be  the  operator  11(0, x,»)  -  x(0,»),  one  can  write  (4.4) 
symbolically  as 

(4.5)  ti1(9)(I  -  P(0 )+  n(0))  -  n(0)  Q1(0)  . 

(This  is  the  general  state  space  analogue  of  (3.8).)  The  formal  inverse  of 
(I-P(0 )+n(8))  is  given  by 


l  (p(0)  -  n(0))k  . 

k-0 

Because  of  the  statlonarlty  of  x(0)  and  the  Independence  of  n(0,x,») 
from  x,  it  follows  that  (P(0)-n(0))k  -  P(0)k  -  11(0),  for  k  ^  1.  Hence, 
a  formal  analysis  of  (4.5)  shows  that 

m 

*1.(0)  -  k(9)  q  (0)  +  l  x(0)  Q.(0)(pk(0)  -  n(0))  . 

1  1  k-l  1 

For  the  same  reason  that  rj ^(0 ,S )  ■  0,  Qi(0,x,S)  ■  0  and  hence 
Q1(0)n(0)  -  0.  Consequently, 
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(4.6) 


Tl.(0)f(0)  -  i  *(0)  Q.(0)  P(0)k  f(0)  . 

1  k-0  1 

Suppose  ChaC  the  measures  P(»,x,dy)  are  absolutely  continuous  with 
respect  to  P(0,x,dy)  In  a  neighborhood  of  6.  Then,  one  expects  that 
Q1(0>xtdy)  has  a  density  with  respect  to  P(0,x,dy),  call  It  qj(0,x,y).  A 
typical  term  on  the  right-hand  side  of  (4.6)  then  takes  the  form 

/  n(0,dx)  /  q, (0 ,x,y)  P(0 ,x,dy)  /  Pk(0,y,da)  f(0,x) 

S  S  X  S 

which  can  be  represented  probabilistically  as  an  expectation: 

E0 lqt(0  »XQ ,Xj )f (0  9Xk¥l ) ] 

where  E0(*)  Is  the  expectation  corresponding  to  P0,  and  P0  Is  the 
probability  on  path-space  associated  with  Initial  distribution  n(0)  and 
transition  function  P(0).  Thus,  the  second  term  in  (4.2)  has  the  formal 
representation 

m 

(4.7)  l  E0[q1(0,Xo,X1)f(0,Xk+1)]  . 

k"0 

The  formula  (4.7)  is  the  key  to  the  Monte  Carlo  analysis. 

Each  term  In  (4.7)  can  be  consistently  estimated  (under  suitable 
hypotheses)  via 

1  n“l 

V6-VV.),<9,1W 
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when  X  evolves  according  to  transition  function  PC 0)  (regardless  of  X's 
initial  distribution).  In  order  to  estimate  the  infinite  sum,  a  standard 
device  is  to  consider  an  estimator  of  the  form 


(4.8) 


A(n)  .  n-Jl(n)-l 

Jo  5TCT  Jo  VO-VW  «*-W 


where  the  truncation  point  A(n)  is  keyed  to  the  sample  size  n  in  such  a 
way  that  i(n)  ♦  <*>  with  X(n)/n  ♦  0.  The  particular  choice  of  A(n) 
effects  a  compromise  between  bias  and  variance  effects  in  estimating  the 
infinite  sum  (4.7). 

Since  q,(q,x,y)  is  generally  easily  computable  (for  S  countable, 
qj(®»J»^)  ■  (5P(0,j  ,k)/99j)  •  P(0,j,k)  *),  (4.7)  provides  a  Monte  Carlo 
solution  to  estimating  the  appropriate  gradient. 

It  turns  out  that  (4.6)  is  closely  related  to  a  formula  which  one 
obtains  when  one  uses  likelihood  ratio  change-of-measure  ideas  to  evaluate 
gradients.  These  connections  will  be  explored  more  fully  in  a  future 
paper. 


5.  SUMMARY 

We  have  shown  that  the  gradient  Va(0)  of  steady-state  quantity  a 
plays  a  critical  role  in  the  variance  and  Interval  estimation  theory  for 

a 

steady-state  estimators  a (6)  of  complex  stochastic  systems.  In  some 
sense,  the  large-sample  variance  and  interval  estimation  theory  is  fully 
solved  given  that  one  can  evaluate  a (©)  and  Va(0).  Numerical  methods 

A 

for  dealing  with  a(6)  when  the  system  is  Markov  are,  of  course,  well 
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known.  However,  numerical  algorithms  for  evaluating  Va(6)  are  a  recent 
development.  We  have  therefore  provided  a  self-contained  exposition  of  the 
relevant  theory,  and  discuss  both  Monte  Carlo  (see  (4.3)  and  (4.8))  and 
non-Monte  Carlo  (see  (3.9))  approaches  to  solving  the  problem. 

APPENDIX 

Let  P(»)  be  a  family  of  n  x  n  stochastic  matrices  which  are: 

(1)  Irreducible  in  a  neighborhood  of  9 
(11)  differentiable  at  9. 

Under  (1),  P(  • )  has  a  unique  stationary  distribution  n(»)  in  a 
neighborhood  of  9.  Our  goal  is  to  rigorously  verify  the  first  equation  in 

(3.9) . 

Given  the  existence  of  the  Inverse  matrix  F(8)  -  (I-P(9)+IT(9))  *, 

(3.9)  follows  Immediately  once  the  differentiability  of  n(9)  is 
established.  Note  that  for  h  sufficiently  small, 

it(0+he  )  -  *(9)  -  x(0+hel)t[P(9)  +  hQ1(9)  +  o(h)J  -  x(9)C  P(9) 


so 


[x(9+hei)  -  x(9)]t(I-P(9))  -  hx(9+he1)t  <^(9)  +  o(h) 

(note  that  o(h)x(9+he^)  ■  o(h)  since  all  terms  in  xO+he^)  are  uniform¬ 
ly  (In  h)  bounded  by  1).  Since  IT(9)  has  Identical  rows  and  xO+he^) 

Is  stochastic  for  h  J>  0,  It  follows  that  [x(9+he^)-x(9)]II(9)  ■  0.  Hence, 
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(Al)  fit(0+hei)  -  *(e)lC  -  hit(e+he1)t  Q1(0)  F(0)  ♦  o(h)  . 

Again,  since  nCS-t-he^)  is  uniformly  bounded  in  h,  it  is  evident  from  (Al) 
that  n(0+he^)  is  continuous  at  h  ■  0.  Thus,  (Al)  implies  that 

[it(0+he1)  -  it(0))C  -  hn(0)t  Qt(0)  F(0)  +  o(h) 

i.e.,  1^(0)*  -  it(0)C  Qi(0)  F(0)  , 

which  is  the  required  result. 
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Stochastic  Differential  Forms 


Kiigcne  Wong1 


1.  Introduction 

A  substantial  body  of  results  on  stochastic  integration  with  respect  to  uiultiparametcr  mar¬ 
tingales  now  exists.  Yet,  as  it  stands,  the  theory  is  not  entirely  satisfactory  in  a  number  of  ways. 
In  particular,  the  calculus  for  stochastic  integraton,  already  complicated  in  two  dimension, 
becomes  prohibitively  so  in  higher  dimensions.  In  retrospect,  the  source  of  the  difficulty  seems  to 
be  that  integration  over  n-diinensional  volumes  in  n-space  is  only  a  very  small  part  of  a  complete 
theory  of  integration  in  n-space.  What  seems  to  be  needed  is  a  theory  of  differential  forms  involv¬ 
ing  martingales  and  integration  of  such  forms  on  sets  of  appropriate  dimensionality.  To  embark 

i 

on  a  course  to  develop  such  a  theory  is  the  objective  of  the  work  reported  here. 

Ou.  approach  to  stochastic  differential  forms  follows  the  general  approach  of  Whitney  [2) 
and  forms  are  defined  as  function  on  chains  or  functions  parametrized  by  chains  satisfying  certain 
continuity  conditions.  While  the  Hat  cochains  defined  by  Whitney  ( 1 2 , .  ch  IX)  have  the  represen¬ 
tation 

A»=  f  dt,  (liI) 

O 

we  cannot  expect  such  a  representation  to  hold  for  any  class  of  stochastic  forms  that  includes  the 
Wiener  process.  However,  as  we  intend  to  show  in  this  paper,  an  exterior  calculus  for  martingale 
forms  can  be  constructed  without  such  a  representation. 

The  focus  of  this  paper  will  be  on  the  conceptual  framework  needed  for  the  development  of 
a  theory  of  differential  forms.  Details  on  some  aspects  of  this  paper  can  be  found  in  a  forthcom¬ 
ing  paper  by  M.  Zakai  and  the  author  [5j. 


'Department  of  Electrical  Engineering  and  Computer  Sciences  and  the  Electronics  Research  Laboratory,  University 
of  California,  Berkeley,  CA  (M720. 
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2.  Co-Chains  and  Form 


hit  ay  denote  a  finite  interval  open  to  tin:  left  and  closed  to  the  light  on  the  /y  axis.  For 
•  </,«,**  ay  will  denote  a  possibly  oriented  2-dimensional  rectangle  with  sides  o,  and  fly  and 
ay  *  a,-  *=  —  a,-  *  ay  will  denote  the  same  rectangle  with  a  negative  orientation.  In  general,  let 
fli^flfj, . ,fl,  denote  intervals  as  above.  Then  a,2  *  •  •  •  •  *  alf  will  denote  an  r  dimen¬ 
sional  rectangle  with  sides  . ,«f(  .  The  orientation  is  positive  if  an  even  permutation  of 

. ,«f)  puts  it  into  increasing  order,  and  the  orientation  is  negative  otherwise.  We  call 

such  rectangles  oriented  r  — rectangles  and  refer  to  [I]  as  the  direction  of  «,•  *  fl,2  "  •  •  •  *  fl,-f. 

We  note  that  the  boundary  da  of  an  oriented  (r+1)  rectangle  a  is  a  collection  of  oriented  r- 
rectangles  that  overlap  at  most  on  boundaries.  Subdivision  of  an  r-rcctangle  produces  a  collection 
of  r-rectangles.  It  is  useful  to  denote  such  a  collection  by  a  sum  <rt  +  <r2  +  •  •  •  ■  +  am  .  Further¬ 
more  if  a  is  an  oriented  r-rcctangle  it  is  useful  to  denote  by  —a  the  same  rectangle  with  the  oppo¬ 
site  orientation.  It  is  therefore  useful  to  introduce  linear  combinations 

m 

A  -  £  okak  (2.1) 

*-i 

where  ak  are  real  numbers  taking  values  in  {—1,1}  and  ak  are  oriented  r-rectangles.  We  shall  call 
any  sum  of  the  form  (2.1)  an  r— chain  . 

Let  A'(ff)  be  a  real-valued  random  function  defined  on  (f l,F,P )  and  parametrized  by  oriented 
r-rectangles  such  that 

(a)  X(<r)  is  defined  for  every  oriented  r-rectangle  a 

(b)  A(<rj  =  -A'f-tf)  and  for  disjoint  rectangles  X (2  X[ak) 

*-i  *-i 

We  can  extend  A'  to  all  rectangular  r-chains  by  linearity  and  X  so  extended  is  termed  a 
random  r  — cochain . 

Intuitively  we  would  Uke  to  write  a  random  r-cochain  X(a)  as  an  integral  over  a 
X(a)  =  J  X 
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where  the  integrand  A'  is  a  "stochastic  differential  r-form."  For  tin*  concept  of  stochastic  differential 
forms  to  l>e  useful,  we  need  to  integrate  X  not  only  on  rectangular  chains,  hut  .also  on  suitable  r- 
surfaccs.  For  this  purpose,  we  need  an  appropriate  topology  on  chains  and  a  corresponding  con¬ 
tinuity  condition  on  cO-chains  with  respect  to  such  topology. 

Let  |<r  |  denote  the  r-dimcnsional  volume  of  the  oriented  rectangle  a  with  \<r  |  =  1  for  r— 0. 
For  A  defined  by  (2.1)  with  disjoint  ak ,  k  —  1,-  •  •  ,n,  the  mass  of  a  chain  A  is  defined  as 

M  I  =  £  I**  Ik*  I  • 

l 

Turning  to  another  norm,  let  {Am,  tn  =  1,2,-  •  -  )  be  a  sequence  of  r  chains,  we  say  that  the 
sequence  is  a  Cauchy  sequence  if  either 

\Am-A„  |  r  0 

m,S—  oo 

or,  if  for  every  m,k  there  is  an  r+1  chain  Bm  k  such  that  dBm (*  -  Am  —  Ak  and 

-  0 
m  ,k  —oo 

Note  that  for  the  convergence  of  an  n-chain  in  /?*,  only  the  first  type  of  convergence  makes  sense, 
while  for  the  convergence  of  a  1-chain  in  /? 2  to  a  curve  the  second  type  of  convergence  is  necessary. 
Therefore,  it  is  useful  to  define  the  flat  norm  |>1  |  for  an  r-chain  in  /?*  by  ([2],  p  151) 

\A  |*  -  «n/<M  -0Z?|+  \B  |)  (2.2) 

where  the  infimum  is  over  all  r  +  l  chains  B .  It  is  shown  in  [2|  that  M  +  B  |  <  M  |  +  \B  \ 

and  |i4  |  «  0  if  and  only  if  A  =  0  .  Hence,  |-|  is  a  norm.  Furthermore,  |- |  satisfies:  (see  [2]) 

I *A  T  <\A\£\A  |  (2.3) 

Note  that  for  r  -  n,  |A  |  *»  \A  |.  For  r  -  0  and  A  a  point  in  if",  |/l  |  -  1.  For  the  case 

where  A  is  the  difference  of  two  points,  s  and  I ,  M  |  *  min (2,  |(*,<)  |) 
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We  ean  now  define  a  stochastic  differential  r-form  as  the  formal  integrand  of  a  stochastic  r-cochain 
that  is  continuous  in  probability  with  respect  to  the  flat  norm  defined  in  the  previous  section,  i.e., 

r 

A'XA*,)— *0  whenever  |/tm  |  — ►  0  as  ni  — *  oo  .  (2.1) 

Similarly  a  stochastic  differential  form  is  said  to  be  an  Lf  form  or  a  q-integrablc  form  if 

E  |A'(A)|*  <  oo  (2.5)’ 

and 

E  l-^(Am)  |f  — »  0  (2.5)” 

whenever  | Am  |  — *  0.  In  (2.4)  and  (2.5)  we  extend  the  definition  of  A'  to  limits  of  chains  under 
the  flat  norm  by  adjoining  X(A  „). 

As  an  example  let  i?  be  "Gaussian  white  noise"  on  R2  defined  as  follows: 

(a)  r/(<x)  is  a  Gaussian  random  function  parametrized  by  oriented  2-rectangles  con 

(b)  Ev(v)  =  0 

(c)  E i/(<z)ij(<t'  )  =«  /i(fn5l )  if  <r  and  <r*  are  similarly  oriented 

=  -/i(ffn?  )  otherwise 

where  a  denotes  a  without  orientation  and  p  denotes  the  Lebesgue  measure. 

A  Wiener  process  Wj,  t  £  R%,  is  defined  by 

W,  -  r,(At) 

where  A{  is  the  rectangle  («  :0  <«<<}.  The  white  noise  i)  is  a  random  rectangular  2-cochain. 
Since  Et)2(<r)  —  \o  |,  (2.5)  is  satisfied.  The  Wiener  process  is  a  0-cochain  satisfying  (2.5). 

Next,  we  define  the  exterior  derivative  dX  of  a  random  r-cochain  A"  (via  the  Stokes  theorem) 
as  follows.  Set 

dX(A )  -  X(dA )  (3.8) 

for  all  oriented  (r+1)  chains  A .  The  exterior  derivative  of  a  stochastic  differential  form  as  defined 
by  (2.4)  and  (2.5)  is  also  a  stochastic  differential  form  of  the  same  type,  this  follows  directly  from 
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tit*  definition  of  iX  uid  ilit-  fart  that  |d-l  |  <  |.-t  | 

As  we  have  denned  them,  random  diflerential  r-forms  are  random  currents  of  Ilo  (I,,  but  not 
e'cry  Ito  current  is  an  r-form  in  our  sense  While  linear  operations  are  definable  on  all  random 
currents,  nonlinear  operations  (eg,  exterior  products)  are  not  The  r-forms  that  we  have  defined 
have  the  right  degree  of  localisation  to  allow  exterior  products  to  be  defined 

If  A1  is  a  regular  (nongeneralited)  differential  form,  then  A'  can  be  represented  as 

£  °|l|(0*il|  (2  7) 

|t|  v  1 

where  the  differentials  d(|i|  •  it,  *  dl,  *  •  •  •  '  it,  provide  a  local  coordinate  system  For  a  ran¬ 
dom  current  such  a  representation  is  in  general  not  possible,  but  a  useful  representation  similar  to 
this  one  still  exists.  For  a  random  cochain  A',  define  A'|j|  as  the  cochain  such  that  for  every  rec¬ 
tangle  a 


An|(<r)  m  A(<t)  if  a  has  the  direction  |i 
*  0  otherw  ise 

Then  for  any  rectangular  chain  A 

in 

and  if  X  is  a  random  differential  form  so  is  ,Y<j 


(2.8) 


(29) 

Hence  we  can  w rite 


A'«£*m 

in 


(2.10) 


and  this  is  the  equivalent  of  (2.7)  for  random  differential  forms. 

The  rectangular  coordinate  system  provides  an  alternate  but  equivalent  definition  for  the  exte¬ 
rior  derivative  as  follows.  Define  ik  A’|||  for  rectangles  a  as 

d*A'm(<r)  «  rfA'm(ff)  if  I-  is  not  in  111  and  e  has  direction  14, 1  ill 

'  .  .  (2.11) 

■ 0  otherwise 


Then  we  can  write 
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(2.12) 


dkX  =  £  dk  A',, | 

|i! 

and 

dX  =  £  </*A' 

k 

For  an  example,  consider  a  Wiener  process  IF,  ,  t  £  /?  ® .  Take  o,  and  <r3  to  he  the  horizon¬ 
tal  and  vertical  1-rectangles 

ffl  *  ((<t»*a)»  (*l  +  a*<s)|.  °t  “  ((*1.^2).  (1li<2+  *)1 

oriented  from  the  left  (from  below)  to  the  right  (to  above).  We  have 

d3W(<r2)  -  W,^  -  Wtiita 
dtW(*2)-d2W(ax)-  0 

Now,  take  a  positively  oriented  2-rectangle  a  with  l  =  (f|,<2)  and  1  =  (f,+a,<2+&)  .  Its  boun¬ 
dary  da  is  given  by: 

&a  “  {  a\  .  ~ ff2  >  ((*!+•  i*2)>(*l+a  il2+^))  .  _((f|>*2+M>(*l  +  a  >*2+M) 

Hence 


d(d,  W)H  «  (MV.,<9-  “W  -  ("W+*  -  w',l,8+») 

-  -»>(*) 


(2.14) 


and 


d(d2W)(a)  =  ri(a)  (2.15) 

We  can  interpret  (2.14)  and  (2.15)  as  follows 
d(dxW)  =  dxdxW  +  d2dxW 


d{d2W)  -  dxdxW  +  dxd2\V 

with  dxdxW  *  d2d2W  =  0,  d3d,lV  -  —dxd2W  and  dxd2W  -  rf12lF  =  tj.  Observe  that 
ddW  -  d(dx  W+d^W)  «  d2dx  W  +  dxd2W  -  0 
as  it  should  be. 
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Filially,  we  note  that  the  Hodge  star  operator  *  is  a  linear  operator  defined  on  all  Ito  random 
ciirrent«.  lienee  *.\  is  well  defined  as  an  Ito  random  current  for  any  r-cocliain  A”  considered  as  an 

Ito  random  current.  However,  *X  is  not  necessarily  a  cochain  (equivalently  a  differential  form) 
and  for  many  interesting  cases  it  is  not.  For  example,  let  i)  be  an  n-cochain  representing  Gaussian 
white  noise,  for  *>/  to  be  a  0-cochain  it  must  be  a  continuous  random  function.  However 

*(*)-/  •’litSdti-  •'*. 

9 

so  that  *t)  cannot  be  a  continuous  random  function  and  hence  is  not  a  0-cochain. 

3.  Markovian  Currents 

It  is  well  known  [3]  that  an  isotropic  Gaussian  random  field  with  a  covariance  function  given 
by 

EXtX,~  Jc'M  i  —du 

is  Markov.  Indeed,  it  is  one  of  the  few  known  examples  of  Markovian  fields.  Yet,  because 
EX?  =  oo  ,  X  is  a  random  current  (generalized  field)  rather  than  an  ordinary  field.  Insofar  the 
Markov  property  requires  the  consideration  of  "surface  data,"  how  it  can  be  applied  to  a  random 
current  requires  a  careful  interpretation. 

Let  D,  denote  the  space  of  all  ordinary  p-forms  with  coefficients  that  are  C“°  functions  with 
compact  support.  A  random  r-current  X  in  Rm  is  a  continuous  linear  random  funtion  on  Da_r. 
For  a  random  r-current  X  and  /  ED,  with  p  +  r  <n,  X‘J  is  well  defined  by 

(*•/)(*)«  *(/•#)  for  all  g  6  Da_,_r  . 

Now  we  can  define  localizuble  currents  and  Markovian  currents.  We  say  an  r-currcnt  X  is  locate- 
able  if  A*/  is  a  stochastic  (n— 1)  form  for  all  /  G  Da_r_(. 

Suppose  T  is  an  (n— l)-surface  separating  77”  into  a  bounded  pArt  B~  form  and  compounded 
part  f?+.  For  any  current  X  we  can  define: 

past  (A')  -  \x(f  ),  support  (/ )  C  B~ 
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future( 


A')  -  |a'(/). 


support  (f  )C  U * 


For  a  localizable  r-currcnt  X,  we  can  further  define 


prescnt(X)  =  J(X-/)(<r),  /  e  D. <r  C  V  j 

We  say  a  localizable  O-current  X  is  Markov  if  for  any  T,  past  (X)  and  future  (X)  are  conditionally 
independent  given  present  (X).  For  1  <  r  <  n  -1,  we  say  an  r-current  X  is  Markov  if  both  X  and 
*X  are  localizable  and, 

|pa«((X),  past(  *X)]  and  (/uliire  (X),  future  (*X)j 
are  conditionally  independent  given  |preaen<(X),  preten  *(**))  An  n-current  X  is  said  to  be  Mar¬ 
kov  if  *X  is  Markov.  Eq.  (3.1)  yields  an  O-current  that  is  Markov. 


4.  Exterior  Product 

If  X  and  Y  are  stochastic  differential  forms,  then  any  definition  of  the  exterior  product  X *  Y 
would  involve  the  “product”  of  generalized  processes,  not  a  well  defined  quantity.  For  example,  if 
q  is  a  white  noise  n-form  and  X  is  an  O-form,  the  (X-q)  (<r)  is  a  stochastic  integral 

fx,  n{dt) 

9 

Thus,  to  define  exterior  products  requires  a  generalization  to  stochastic  integration.  One  way  of 
defining  the  exterior  product  is  to  define  martingale  forms,  and  to  require  that,  for  martingale  p 
and  r  forms  X  and  Y,  the  exterior  product  X‘Y  be  a  martingale  (p  +  r)  form.  Defined  this  way, 
the  exterior  product  is  a  generalization  of  both  the  martingale  stochastic  integral  for  one-parameter 
processes  and  the  stochastic  integrab  of  types  l  and  2  of  Wong  and  Zakai  [4j.  The  situation  can 

a 

be  summarised  as  follows: 
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I 

A  > 

II  1 

0 

0 

product 

0 

1 

martingale  integral 

n  =  2 

0 

0 

product 

0 

I 

martingale  integral  on  paths 

0 

2 

1  ype  1  integral 

1 

1 

|  type  2  integral 

Details  of  how  X -  Y  is  defined  can  be  found  in  (5|. 
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ABSTRACT.  Typically,  modern  stochastic  control  theory  uses  ideal  white 
noise  driven  systems  (Ito  equations),  and  if  the  observed  data  is  corrupted  by 
noise,  that  noise  is  usually  assumed  to  be  ’white  Gaussian’.  If  the  models  are 
linear,  a  Kalman-Bucy  filter  is  then  used  to  estimate  the  state,  and  a  control 
based  on  this  estimate  is  computed.  Actually,  the  noise  processes  are  rarely 
’white’,  and  the  system  is  only  approximated  in  some  sense  by  a  diffusion.  But, 
owing  to  lack  of  ’computable’  alternatives,  one  still  uses  the  above  procedure. 
Then  the  ’filter’  estimates  might  be  quite  far  from  being  optimal.  We  examine 
the  sense  in  which  such  estimates  are  useful,  in  order  to  justify  the  the  use  of 
the  commonly  used  procedure.  For  the  filtering  problem  where  the  signal  is  a 
’near’  Gauss-Markov  process  and  the  observation  noise  is  wide  band,  it  is 
shown  that  the  usual  filter  is  ’nearly  optimal’  with  respect  to  a  very  natural 
class  of  alternative  data  processors.  The  asymptotic  (in  time  and  bandwidth) 
problem  is  treated,  as  is  the  conditional  Gaussian  case. 

The  paper  is  an  outline  of  some  of  the  work  reported  in  [9]. 

I.  INTRODUCTION.  Typical  models  in  modern  filtering  theory  are  of 
the  following  type,  where  W(  )  are  standard  Wiener  processes,  u()  is  a 
control,  and  b,,  o,  etc.,  are  appropriate  functions.  We  let  z()  denote  a 
reference  signal  and  the  noise  corrupted  observation. 


(1.1) 

dz  ■  bt(z)dt  +  og(z)dW( 

(1.2) 

dy  -  h(x,z)dt  +  dWy 

The  actual  physical  system,  which  we  denote  by  z€(-),  7€(-)  is  not  of 
the  form  (1.1)  -  (1.2).  The  reference  signal  z€()  might  be  only 
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approximately  representable  by  (1.1),  and  the  noise  in  the  control  and 
observation  system  would  rarely  be  ’white*.  But,  via  some  approximation  or 
identification  procedure,  one  chooses  a  model  of  the  form  (1.1)  -  (1.2),  then 
computes  a  good  filter  for  that  model,  and  then  applies  this  filter  to  the 
actual  physical  system.  One  must  question  the  value  of  the  filter  output  when 
applied  to  the  ’physical’  problem.  The  filter  output  might  not  be  even  nearly 
optimal  for  use  in  making  estimates  of  z€().  Such  questions  are  basic  to  the 
relevence  of  much  theoretical  work.  We  will  deal  with  these  questions  here, 
when  the  approximating  system  (1.1),  (1.2)  is  linear  -  for  which  a  fairly 
complete  theory  can  be  obtained. 


Owing  to  the  usual  lack  of  'near  optimality’  (when  applied  to  the 
physical  system)  of  the  filter  which  is  obtained  by  using  (1.1)  -  (1.2),  one 
should  ask  the  question:  with  respect  to  which  alternative  filters  (called  ’data 
processors’  below)  for  the  physical  system  is  the  chosen  one  nearly  optimal? 
It  turns  out  that  this  alternative  class  of  filters  is  quite  large  and  quite 
reasonable.  The  basic  mathematical  techniques  used  here  are  those  of  the 
theory  of  weak  convergence  of  probability  measures  [1],  [3],  [4],  a  technique 
which  is  quite  useful  for  problems  in  the  approximation  of  random  processes 
[1],  [5]  -  [8],  (12],  [13]. 


When  the  ideal  model  is  linear  •  one  would  usually  use  the  Kalman-Bucy 
filter  appropriate  for  the  ideal  model,  but  whose  input  is  the  physical 
observation.  Obviously,  the  filter  does  not  usually  yield  the  conditional 
distribution  of  the  z€(t)  given  the  data  y‘(s),  s  <  t.  In  Section  2,  we  discuss 
some  counter  examples  to  illustrate  the  sort  of  difficulties  which  arise  in  such 
approximations,  and  in  Section  3  the  approximation  theorem  is  given,  together 
with  the  class  of  alternative  data  processors.  Section  4  concerns  the  average 
filter  error  per  unit  time  -  or  the  errors  for  large  time.  The  symbol  ♦  denotes 
weak  convergence.  A  fuller  development  appears  in  [9],  together  with  the 
conditional  Gaussian  case  and  a  treatment  of  certain  non-linear  observations. 
For  the  weak  convergence,  wc  work  with  the  space  Dk[0,“),  the  space  of 
Revalued  functions  which  arc  right  continuous  and  have  left-hand  limits,  and 
endowed  with  the  Skorohod  topology.  (See  [I],  [3],  [4].)  Reference  [2]  deals 
with  similar  approximations  for  the  non-linear  filtering  problem,  and  reference 
[10]  concerns  the  approximation  problem  for  the  non-linear  control  problem. 

II.  LINEAR  FILTERING:  PRELIMINARIES.  Consider  the  following 
filtering  problem:  For  each  «  >  0,  z€(  )  is  a  signal  process,  {*(•)  is  a 
’wide-bandwidth’  observation  noise,  and  the  two  are  mutually  independent.  The 
actual  observation  process  is: 


(2.1) 


y€(t)  -  H,z€(t)  +  <J(t),  y€(0)  -  0  . 
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All  ’noise’  processes  are  assumed  to  be  right  continuous  and  have  left-hand 
limits.  Define  yc(t)  -  Jq  y€(s)ds  and  Wy(t)  ■  Jq  l*(s)ds.  Let  z()  satisfy 
(for  matrices  Ag  ,  etc.) 

(2.2)  dz  -  A.zdt  +  BgdWg, 


Since  {*(•)  is  to  be  ’nearly’  white  noise,  and  z€()  ’nearly’  a 
Gauss-Markov  diffusion,  let 


(2.3)  (zc( -),  Wj(  ))  *  (z(  ),  Wy())  as  c  -  0  , 

where  Wy(  )  is  a  non-degenerate  Wiener  process.  The  Wg(  • )  and  Wy(  )  must 
be  independent.  Also  y€(  )  ♦  y(  ),  where 


(2.4) 


dy  -  Hgzdt  +  dWy  ,  y(0)  -  0  . 


The  actual  physical  system  is,  of  course,  ’fixed’  and  corresponds  to  some  small 
€  >  0.  The  use  of  weak  convergence  here  is  just  a  way  of  embedding  the 
actual  data  in  a  sequence  •  so  that  an  approximation  method  can  be  used.  The 
approximation  of  the  values  of  expectations  of  functions  of  zc(  ),  conditioned 
on  the  data  y£(  )  is  not  easy  in  general.  Furthermore,  we  cannot  restrict 
ourselves  to  Gaussian  noise,  since  it  itself  is  only  an  approximation  to  the 
physical  processes. 


For  (2.2),  (2.4),  the  filter  equations  are 


(2.5)  dz  -  Agzdt  +  Q(t)  [dy  -  Hgzdt] 

Q(t)  -  E(t)Hg’  Ro*1 

(2.6)  E  ■  A,E  +  EA#'  +  B.B,'  -  Eh'r^HE  , 


where  R0  -  covariance  matrix  of  observation  ’noise’  Wy(l),  which  we  set  to 
I,  unless  mentioned  otherwise.  In  practice,  with  signal  zc(  )  and  noise  (y(  ), 
one  normally  uses  (2.6)  and  (2.5WB): 


(2.5W0) 


A,  z€  +  Q  (t)  [y€  -  Hgz6J  . 
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This  system  is  not  necessarily  even  a  nearly  optimal  filter  for  the  physical 
observation.  But,  as  will  be  seen,  it  makes  a  great  deal  of  sense  and  is  quite 
appropriate  in  a  specific  but  important  way. 


Some  illustrations  will  illustrate  the  problems  that  we  must  contend  with, 
particularly  concerning  the  possible  lack  of  continuity  in  the  optimal  estimators 
as  the  noise  bandwidth  goes  to  •.  Let  (Xn,Yn)  be  bounded  real-valued 
random  variables  which  converge  in  distribution  to  (X,Y).  Generally 

E(Xn|Yn)  — /-*  E(X | Y).  For  example,  let  Xn  -  X,  Y  -  X/n.  Next,  let 
Zn  =  Zn(Y),  where  Y  is  a  random  varible  and  (Zn,Y)  *  (Z,Y)  .  Then  Z  is 
not  necessarily  a  function  of  Y,  and  might  even  be  independent  of  Y,  as 
illustrated  by  the  following: 


Let  Y  be  uniformly  distributed  on  [0,1].  Define  Zn  «  nY 
frr  0  <  Y  <  1/n  and,  in  general,  define  Z  -  (nY  -  k)  on 

k/n  <  Y  <  (k+l)/n,  k  -  0,1 n-1.  Then  (Zn,Y)  *  (Z,Y)  where  Z  is 

independent  of  Y,  and  both  Z  and  Y  are  uniformly  distributed  on  [0,1]. 
Clearly  E(Zn|Y)  E(Z|Y)  in  any  sense. 

Even  though  W£()  =>  Wy(  ),  a  non-degenerate  Wiener  process,  y€(  ) 
might  contain  a  great  deal  more  information  about  z£()  than  y()  does 
about  z(  ).  See  [9]  for  an  example  where  as  €  -*  0,  we  can  calculate  z£(t) 
nearly  exactly  from  the  data  y£().  In  general  we  have 


Let  (Xn,Yn)  *  (X,Y)  ( Xn-real  valued ,  Y„  with  values  in  R*).  Then 

(2.7)  lhJT  E[X„  -  E(X  IYn)]J  <  E[X  -  E(X|Y)]S  . 

n  1  1 


In  the  above  examples,  the  inequality  is  strict.  The  examples  do  caution  us  to 
take  considerable  care  in  dealing  with  information  processing  with  wide 
bandwidth  noise  disturbances. 

ILL _ IHE _ :aepj\.QXIMAJ£LY _ QEDMAL! _ LINEAR _ EILXERINg 

PROBLEM.  For  J:he  ideal  filtering  problem  (2.2),  (2.4),  the  optimal  decisions 
are  functions  of  z(),  E(  )  ,  since  these  completely  determine  the  conditional 
distribution.  There  are  no  functions  of  the  data  which  give  better  estimates. 
This  is  not  so  with  estimates  based  on  E(),  zc(  )  for  the  system  z£(), 
y£().  We  now  define  a  class  of  functions  of  the  observed  data  y€(  )  with 
respect  to  which  functions  of  z€(  ),  E(  )  are  ’nearly  optimal’  for  small  €  > 
0.  We  need  to  specify  both  a  criterion  of  comparison;  i.e.,  a  cost  function. 
Although  we  use  one  particular  cost  function,  the  general  idea  and  possible 
extensions  should  be  clear  . 
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Let  I)  denote  the  class  of  measurable  functions  on  C[0,»],  the  space  of 
real  valued  continuous  functions  on  [0,*)  (with  the  topology  of  uniform 
convergence  on  bounded  intervals),  which  are  continuous  w.p.l  relative  to 
Wiener  measure  (hence,  with  repect  to  the  measure  of  y(  ».  Let  Dt  denote 
the  subclass  which  depends  only  on  the  function  values  up  to  time  t.  For 
arbitrary  F(-)  €  D  or  in  Dt,  we  will  use  F(y€(-))  as  an  alternative 
estimator  of  a  functional  of  z€(*).  The  class  is  quite  large. 

First,  note  that  D  contains  all  continuous  functions  and  that  the  z(-) 
of  (2.S)  can  be  written  as  a  continuous  function  of  the  integral  of  the  driving 
force  y().  Thus,  continuous  functions  of  z€()  are  admissible  estimators. 
Many  important  functionals  are  only  continuous  w.p.l  (relative  to  Wiener 
measure).  Let  T(x())  denote  the  first  time  that  a  closed  set  A  with  a 
piecewise  differential  boundary  is  reached  by  x().  Then  the  function  with 
values  T  n  T(x(-))  is  in  for  any  T  <  •.  Thus,  our  alternative 

estimators  can  involve  stopping  times.  This  is  essential  in  sequential  decision 
problems,  since  there  the  cost  function  involves  first  entrance  times  of  a 
function  of  y()  into  a  decision  set. 

D  and  Dt  do  not  contain  ’wild’  functions  such  as  those  involving 
differentiation.  We  consider  1)  and  Dt  as  a  class  of  data  processors.  It 

seems  to  contain  a  large  enough  class  for  practical  applications  when  the 
corrupting  noise  is  ’white’. 

We  now  state  the  ’model’  ’robustness’  or  ’approximation’  result.  For  a 

function  q(z),  we  write  (P*,q)  foj  the  integral  of  q(z)  with  respect  to  the 

Gaussian  distribution  with  mean  zc(t)  and  covariance  I(t)  -  the  ersatz 
conditional  measure  of  z*(  •)• 

The  theorem  states  that  (for  a  small  e)  the  ersatz  conditional 

distribution  is  ’nearly  optimal’  with  respect  to  a  specific  (but  broad)  class  of 
alternative  estimators.  The  alternative  class  includes  those  that  make  sense  to 
use  when  the  corrupting  noise  is  white.  If  the  noise  is  wide  band,  then  it 
might  not  make  sense  to  exploit  its  detailed  structure  and  use  other  ’better’ 
estimators.  Doing  so  might,  in  practical  cases,  cause  processing  errors  and  other 
(unmodelled)  noise  effects. 

Theorem  3.1.  Assume  the  conditions  on  z€(-),  W*(-)  of  Section  2.  Then  (z€(-). 
z€(  ),  Wj(-))  ♦  (*(•).  *(•).  Wy(  )).  Let  F(-)  €  D,  be  bounded,  and  q(  ) 
bounded  continuous  and  real  valued.  Then  (the  limits  all  exist ) 


(3.1) 


lim  E[q(z€(t))  -  F(y€( •))]* 

>  lim  E[q(z«(t))  -  (Pf,q)]J. 
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Remark.  The  assertion  concerning  the  weak  convergence  is  necessary,  since  we 
need  to  know  that  the  limit  of  the  cited  c-triple  represents  a  true  filtering 
problem.  The  result  would  not  make  sense  if  only  2  out  of  the  3  components 
converged. 

Proof.  By  the  weak  convergence  and  the  w.p.l  continuity  of  F(  •)  , 
(q(ze(t)),  F(y€( •)).  (Pt€,q))  *  (qU(O),  F(y(-)),  (Pt,q)), 

where  (Pt,q)  A*  Jq(z)dN(z(t),  E(t);d::),  and  N(z,E;)  is  the  normal  distribution 
with  mean  z  and  covariance  L  Thus,  the  left  and  right  sides  of  (3.1) 
converge  to,  respectively, 

I 

(3.2)  E[q(z(t))  -  F(y(  •))]*,  E[q(z(t))  -  E[q(z(t)) |y(s),  s  <  t]]2  . 

Since  the  conditional  expectation  is  the  optimal  estimator,  the  second  expression 
is  no  greater  than  the  first.  This  yields  the  theorem. 

Q.E.D. 

IV.  FILTERING  THE  LARGE  TIME  PROBLEM  (Large  t.  small  c).  The 
filtering  system  often  operates  over  a  very  long  time  interval.  For  the  model 

(2.2) ,  (2.4),  or  with  (2.6),  (2.5WB),  one  would  then  use  the  stationary  filter.  But 
wit’i  the  system  y€(  ),  z€(  ),  two  limits  are  involved  since  both  t  -*  •  and  £ 
-*  0,  and  it  is  important  that  the  results  not  depend  on  how  t  •  and  c  0, 
and  that  the  use  of  the  stationary  limit  filter  is  justified.  We  make  some 
additional  assumptions. 

C4.1.  Ag  is  stable,  (Ag,Hf)  is  observable  and  (Ag,Bg)  controllable, 

C4.2.  ty(t)  takes  the  form  (*(t)  -  (y(t/£  J)/€,  where  ly()  is  a  second  order 
stationary  process  with  integrable  covariance  function  R(  ).  Also,  if  t£  -*  •  as 
£  -  0,  then  Wj(t£+)  -  Wy(t£)  *  Wy(- 

Remark.  The  model  (C4.2)  is  a  common  way  of  modelling  wide  bandwidth 
noise,  and  is  used  to  simplify  a  calculation  below,  and  to  avoid  the  details 
involved  with  other  models.  It  can  be  extended  in  many  ways.  We  also  make 
the  rather  unrestrictive  assumption  that  the  initial  time  is  not  important  and 
that  the  z€(  )  processes  do  not  explode: 

C4.3.  If  (ze(tj)  converges  weakly  to  a  random  variable  z(0)  as  e  ■*  0, 
then  z€(t€  +  •)  ▼  z(-)  with  initial  condition  z(0).  Also 


sup  E|zc(t)|a  <  •. 
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Consistency.  In  order  that  £(•),  E(),  be  a  filter  for  z(),  y(),  it  is 
necessary  that  the  initial  conditions  be  consistent.  Let  J4(z,E;A)  denote  the 
probability  that  the  normal  random  variable  (with  mean  z,  and  covarianceA  I) 
takes  values^  in  the  set  A.  By  consistency ,  we  mean  that  P(z(0)  €  AJz(O), 
E(0)}  -  N(z(0),  I(0);A).  One  cannot  choose  the  .initial  (random)  conditions 
arbitrarily.  It  should  be  obvious  that  if  1(0)  -  I  and  (z(0),  z(0))  are  the 
stationary  random  variables  for  (stable)  (2.2)  and  (2.S),  then  the  initial 
conditions  are  consistent. 

The  question  of  consistency  arises  because  when  we  study  the  asymptotics 
as  t  -»  •  and  «  -*  0,  we  will  st|rt  the  filter  at  some  large  t£  and  do  not  knew 
a-priori  what  the  limits  of  (z£(t),  z€(t))  are.  The  initial  condition  of  the 
limit  equations  must  be  consistent  for  the  problem  to  make  sense.  Fortunately, 
they  will  be  consistent. 

Theorem  4.1.  Assume  the  conditions  of  Section  2  and  (C4.1)  -  (C4.3).  Let  q(-) 
be  bounded  and  continuous  and  let  F(-)  €  Dt.  Define  y€(s)  ■  0,  for  s  <  0 
and  define  y€(-",t,  • )  to  be  the  'reversed'  function  -  with  values 
(0  (  T  <  •)  y€(-*\t;T)  -  y£(t-T).  Then ,  if  t£  -  -  as  e  -*  0, 

(1.1)  (z€(t€  +  ■),  £«( t£  +  .),  wj(t€  +  .)  -  wj(t£)}  * 


«•).  *(•).  Wy(  )) 

satisfying  (2.3),  (2.5),  and  z(  ),  z(  )  are  stationary.  Also  (3.1)  holds  in  the 
form 

(4.2)  lim  E  [q(z«(t))  -  F(y€(--,t;  •))]* 


>  lim  E[q(z€(t))  -  (P*q))3  . 


The  limit  of  (P£,q)  is  the  expectation  with  respect  to  the  stationary  (£(•),  E) 
system. 

Proof.  Suppose  that  (z*(t),  «  >  0,  t  <  •)  is  tight.  Then,  by  the  hypothesis, 
(z€(t),  z€(t),  <  >  0,  t  <  •)  is  tight  and  each  subsequence  of 

(z€(t£+>),  z£(t£+  ),  Wy(t£+  •)  -  W*(t£),  tc  <  •,  (  >  0)  has  a  weakly  convergent 
subsequence  with  limit  satisfying  (2.2),  (2.5).  Choose  a  weakly  convergent 
subsequence  (with  t£  -*  ")  also  indexed  by  «  and  with^  limit  denoted  by 
z(-),  z(  •),  Wy(J.  Suppose,  for  the  moment,  that  z(  ),  z(  )  is  stationary. 
(Clearly,  E(t)  -  E  as  t  -*  •.)  If  all  limits  are  stationary,  then  the  subsequence 
is  irrelevant  since  the  stationary  solution  is  unique.  Aiso,  since  the  initial 
conditions  of  z(-)  and  z(  )  are  consistent  (owing  to  the  stationarity),  (z(-), 
E)  is  the  optimal  filter  for  y(  ),  z(  ).  Inequality  (4.2)  is  a  consequence  of 
this  and  the  weak  convergence. 
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Wc  next  prove  tightness  of  (z€(t),  «  >  0,  t  <  •},  and  then  the 
stationarity  will  be  proved.  We  have 


(4.3)  ie  -  [A,  -  CXt)H,  jJ€  +  Q(t)  «t/€l)/«  +  Q(t)H,  z€(t)  . 


Let  #(t,T)  denote  the  fundamental  matrix  for  [Ag  -  Q(t)Ht  ].  There  are 
K  <  •  X  >  0  such  that  |$(t,T)|  <  K  exp  -  X(t-T).  We  have 


$(t,T)  Q(T){(T/€a)dT/« 

0 

+  f  *(t,T)Q(T)Hz€(T)dT  . 

Jo 

A  straightforward  calculation  using  (C4.2  -  C4.3)  and  the  change  of  variable 
T/«2  -  T  in  the  first  integral  yields 

E  | z€(t) |2  <  constant  (1  +  E|ze(0)|2)  , 


giving  the  desired  tightness. 

To  prove  the  stationarity  of  the  limit  of  any  weakly  convergent 
subsequence,  we  need  only  show  stationarity  of  the  limit  values  (z(0),  z(0))  of 
the  (z€(te),  z€(t€)).  For  this,  we  use  a  'shifting*  argument. 

Fix  T  >  0  and  take  a  weakly  convergent  subsequence  of  (indexed  also  by  c, 

€ 

and  with  t€  -•  ") 

{z€(t€+  ),  z€(t<+  ),  W*(t€+  )  -  W*(t€),  z€(t€-T+  ),  zt(tc-T+  ), 

Wj(tf-T+.)  -  Wj(tc-T)) 

wjth  limit  ^denoted  by  (z(  ),  z(  •),  Wy(  ).  M  **(•).  Wy  T(  )).  AWc  have 

zt(T)  -  z'O)  and  Zj(T)  -  z(0).  We  do  not  yet  know  what  zT(0)  or 
zT(0)  are  -  |>ut,  uniformly  in  T,  they  belong  to  a  tight  set,  owing  to  the 
tightness  of  (z€(t),  e  >  0,  t  <  •}.  Write  (where  W§T(*)  ’drives’  the  equation 
for  dzT) 

T 

z(0)  -  zx(T)  -  (exp  A,T)zt(0)  +  J  exp  A,(T-T)-B§dWiT(T) 

o 
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2(0)  -  z^T)  -  (exp  [A,  -  Q(»)H,]T)zt(0) 

T 

+  /  exp  [A,  -  Q(«)H,](T-T)(dWyT(T)  +  H,zT(T)dT) 

0 

Since  T  is  arbitrary  and  the  set  of  all  possible  (zT(0»  is  tight,  the 
stability  of  A§  and  (At  -  Q(*)HI)  implies  that  z(0)  is  the  stationary 
random  variable,  hence  z(  )  is  stationary.  Similarly,  the  pair  (z(),  z(-))  is 
stationary. 

Q.E.D. 


REFERENCES 


[1]  H.J.  Kushner,  Approximation  and  Weak  Convergence  Methods  for  Random 
Processes  with  Applications  to  Stochastic  Systems  Theory ,  M.I.T.  Press, 
Cambridge,  U.S.A.,  1984. 

[2]  H.J.  Kushner  and  Hai  Huang,  "Approximate  and  Limit  Results  for 
Nonlinear  Filters  with  Wide  Bandwidth  Observation  Noise",  Stochastics, 
Feb.,  1986. 

[3]  P.  Billingsley,  Convergence  of  Probability  Measures ,  1968,  Wiley,  New  York. 

[4]  T.G.  Kurtz,  Approximation  of  Population  Processes,  1981,  Vol.  36  in 
CBMS-NSF  Regional  Conf.  Series  in  Appl.  Math,  Soc.  for  Ind.  and  Appl. 
Math,  Phila. 

[5]  A.  Benveniste,  "Design  of  Monostep  and  Multistep  Adaptive  Algorithms 
for  the  Tracking  of  Time  Varying  Systems,"  Proc.,  23  Conf.  on  Dec.  and 
Control,  1984,  IEEE  Publications,  New  York. 

[6]  M.  El-Ansary  and  H.  Khalil,  "On  the  Interplay  of  Singular 
Perturbations  and  Wide-band  Stochastic  Fluctuations",  SIAM  J.  on 
Control,  24,  1986,  83-98. 

[7]  H.J.  Kushner  and  Hai  Huang,  "Averaging  Methods  for  the  Asymptotic 
Analysis  of  Learning  and  Adaptive  Systems  with  Small  Adjustment  Rate", 
SIAM  J.  on  Control  and  Optim.,  19,  (1981),  635-650. 

[8]  H.  Kushner,  "Jump  Diffusion  Approximations  for  Ordinary  Differential 
Equations  with  Wideband  Random  Right  Hand  Sides”,  SIAM  J.  on  Control 
and  Optimization,  17,  1979,  729-744. 

[9]  H.  J.  Kushner  and  W.  Runggaldier,  "Filtering  and  Control  for  Wide 
Bandwidth  Noise  and  ‘Nearly’  Linear  Systems”,  LCDS  Rept.  #86-8,  1986, 
Brown  Univ.;  to  appear  in  IEEE  Trans,  on  Aut.  Control. 


951 


[10]  H.  Kushncr  and  W.  Runggaldicr,  "Nearly  Optimal  State  Feedback 
Controls  for  Stochastic  Systems  with  Wideband  Noise  Disturbances",  to 
appear  SIAM  J.  on  Control  and  Optimization.  Also,  LCDS  Rept.  #85-23, 
1985,  Brown  Univ. 

[11]  A.V.  Skorohod,  "Limit  Theorems  for  Stochastic  Processes",  Theory  of 
Probability  and  Its  Applications,  /,  1956,  262-290. 

[12]  H.J.  Kushncr,  "Diffusion  Approximations  to  Output  Processes  of 
Nonlinear  Systems  with  Widc-band  Inputs,  and  Applications",  IEEE  Trans, 
on  Inf.  Theory,  IT-26,  1980,  715-725. 

[13]  G.B.  Blankenship  and  G.C.  Papanicolaou,  "Stability  and  Control  of 
Stochastic  Systems  with  Wide  Band  Noise  Disturbances",  SIAM  J.  Appl. 
Math  34,  1978,  437-476. 


952 


Adaptive  Kalman  Filtering  for  Instrumentation  Radar 


Charles  K.  Chui 

Department  of  Mathematics 
Texas  A  &  M  University 
College  Station,  Texas  77843 


Robert  E.  Green 

Instrumentation  Directorate 
white  Sands  Missile  Range 
New  Mexico  88002  6 


ABSTRACT 


The  optimization. criterion  of  the  adaptive  Kalman  filter 
for  instrumentation  tracking  radars  is  slightly  modified 
to  change  nonlinear  matrix  equations  to  linear  matrix 
operations,  and  the  resulting  equations  are  implemented 
with  parallel  processing  for  efficient  real-time  applica¬ 
tions. 
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1.  The  tracking  radar 


We  only  consider  monopulse  tracking  radars.  The  pencil-beam  antenna  of  a 
monopulsc  tracking  radar  consists  of  a  reflector  and  a  cluster  of  four  feed  horns  from 
which  four  single-pulses  are  transmitted  simultaneously.  In  tracking  a  target,  two  of 
the  echo  signals  arc  list'd  ior  determining  the  magnitude  and  direction  of  its  azmu- 
thal  angular  position,  the  other  two  for  determining  the  magnitude  and  direction  of 
its  elevation  angular  position,  and  the  f our  together  are  used  to  determine  the  range 
of  the  target  (  see  [2]  and  [7]  for  references  ).  A  transmitted  signal  is  described 
graphically  by  a  lobe  as  shown  in  Fig.l.  If  a  target  happens  to  be  located  along  the 
beam  axis,  the  voltage  response  measured  from  the  echo  signal  at  the  tracking-radar 
receiver  will  be  of  highest  value  since  the  radiated  power  is  concentrated  in  the 
direction  of  the  beam  axis.  If  the  target  is  not  detected  along  the  beam  axis  ,  the 
resulting  voltage  response  will  be  somewhat  smaller. 


Fig.l. 


954 


A  polar  representation  of  the  pencil-beam  from  two  horns  which  are  used  to  deter¬ 
mine  the  azimuthal  or  elevation  angular  position  of  the  target  is  shown  in  Fig.l,  and 
is  translated  to  the  rectangular  coordinates  with  voltage  as  angular  measurement  in 
Fig-2. 


Fig-2. 


The  information  on  the  difference  Av  in  amplitude  between  the  voltage  responses  at 
the  two  positions  of  two  beam-lobes,  usually  called  the  error  signal,  yields  a  measure¬ 
ment  of  the  angular  displacement  (  or  angular  error  )  of  the  target  from  the  scan 
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axis.  Hence,  both  azimuthal  and  elevation  angular  positions  are  observed.  In  addi¬ 
tion,  the  tracking  radar  is  designed  so  that  the  sum  of  the  echo  signals  provides  the 
range  measurement  of  the  displacement  of  the  target.  In  the  monopulse  tracking 
radar  we  are  discussing,  the  echo  signals  are  combined  so  that  both  the  sum  and  the 
dill  creme  signals  are  obtained  simultaneously.  Heme,  the  range  L,  the  azimuthal 
angular  error  A/1  and  the  elevational  angular  error  A E  are  all  obtained  simultane¬ 
ously.  Because  the  azimuth  angle  at  the  k  th  instance  is  A,  =  ak  +  A Ak ,  where  ak  is 
the  horizontal  angle  of  the  scan  axis  of  the  antenna  measured  from  some  reference 
axis  (  cf.  Fig.3  )  and  hAk  is  the  value  of  A  A  at  the  *  th  instance,  Ak  can  be  deter¬ 
mined  immediately.  Similarly,  Ek  =  ek  +  A Ek ,  where  ek  is  the  vertical  angle  of  the 
scan  axis  of  the  antenna  measured  from  the  same  reference  axis  (  cf.  Fig.3  ),  is  also 
obtained. 


Fig.3 
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The  principle  of  generating  the  error  signal  and  determining  the  azimuth  angle 
A,  the  elevation  angle  £,  and  the  range  I  by  the  monopulse  tracking  techniques  we 
discuss  here  already  solves  the  tracking  problem  in  an  enviroment  without  any 
interference  or  noise.  In  practice,  however,  there  are  different  sources  of  noise  such 
as  the  solar  or  galac  tic  noise,  the  ground  noise  from  the  cmirnoinom  of  the  radar  and 
the  electronic  or  mechanical  equipment  in  the  radar  itself.  Hence,  the  data  I,  A  A 
and  A E  we  obtain  from  the  tracking  radar  system  are  associated  with  random  errors 
which  must  be  filtered  out  in  order  to  be  able  to  determine  the  real  values  of  E,  LA 
and  A E .  This  real-time  problem  is  usually  tackled  by  applying  Kalman  filtering. 


2.  Kalman  Filtering 


A  general  linear  mathematical  model  for  the  control -observation  system  is  given 
by 


yi4i=  t;y<  +  H<  Uj  +  I*,  &  .  y„  =  /:  (y„) 

*x  =  C4  y*  +  Dk  u4  +  Ti,  ,  ’  1  1 

where,  lor  each  k  =  0,  l . al%  Bk,Ck,Dk,  and  r4  are  known  constant  matrices,  /.  (>•, ) 

is  given,  {ii*  |  is  a  sequence  of  predesigned  control  functions,  and  {£, }  and  {t),  }  are 
white  noise  processes  ;  that  is,  E(£L  )  ■  0,  *  Qt  StJ,  £(r)t )  =  0,  £(t),  rjj)  =  Kt  8,,, 

£'  (f*  T)yr)  =  o,  for  k  ,j  -  0,  1,2......  Here,  as  Usual, 


l  k  =  j 

0  k  i*  j  . 


As  is  well  known,  this  system  can  be  uncoupled  into  two  systems 

xt  +,  ■  Ak  x*  +  T*  it  ,  Xfl  »  y0 
▼i  =C*X*  +TJ*. 


and 


*  A***  +  A  u*  ,  z„*0 
Vi  =  Ci  Zi  +  A  u, , 

where  y*  =  x*  +  zt  and  w*  =  .  Now,  the  state  vector  t,  can  be  computed  iisinr 

the  lor  mu  la 


Zi  *  21  (Aj_j  .. .  Aj )£i_ini-» 
i  •  i 

and  the  observation  vector  v*  becomes 


▼*  *w4  -C**  • 
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This  information  is  used  to  estimate  xi .  That  is,  we  use*  it  in  studying  the  stochastic 
state-space  decomposition  : 

=  Ai%i  +  Hf*  .  x„  =  y,. 

V,  =  Q  X,  +  TJi  , 

Oi  course,  if  xL  is  the  optimal  estimation  of  xt ,  then  yt  =  xk  +  zL  is  the  optimal  esti¬ 
mate  of  yk  in  the  original  system  (1). 

The  stochastic  linear  system  for  radar  tracking  can  be  described  as  follows.  Let 
I,  LA ,  AE  be  the  range,  the  azimuthal  angular  error,  and  the  elevational  angular 
error,  respectively,  of  the  target,  with  the  radar  being  located  at  the  origin  (  cf.  Fig.3 
),  and  consider  E,  LA,  and  LE  as  functions  of  time  with  first  and  second  derivatives 
denoted  by  E,  la,  ae,  E,  LA,  LE ,  respectively.  Let  h  >  0  be  the  sampling  time  unit 
and  set  E;  =  Ukh ),  E,  =  i(kh ),  Lt  =  Likh ),  etc.  Then,  using  the  second  degree  Taylor 
polynomial  approximation,  the  radar  tracking  model  takes  on  the  following  linear 
stochastic  state-space  description: 

x*  M* 

▼*  =  Cx,  +T)l  '  ' 


where 


Xj  =  [  E|  E*  E|  LAi  LAi  LAk  Lbk  LEk  LEk  f  , 


A 


1  h  m 
0  1  h 
00  1 

1  h  m 
0  1  h 
00  1 

1  h  m 
0  1  h 
00  1 


,  m=h2/ 2, 


C 


100000000 
0001 00000 
000000100 
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v*  =  ex,  +  Tit  = 


+  m  . 


A.4i 
A  £i 


nnd  }  and  {ti.  }  arc  white  noise  processes.  This  is  a  time-invariant  model  ol  ( 2 i. 

OKei  \  i  ill. r  ii  \\ »•  let 


v  1 
Xi 

Ei 

AAi 

A  £, 

Xi  = 

,2 

Xi 

Y  1  " 

•  *4 

ti 

.  x4‘  = 

AAk 

it 

K 

• 

AAi 

■* 3 

*i 

ii 

AAt 

A£, 

fi‘ 

V 

Vi 

ft  = 

Ii2 

.  T)t  = 

■n? 

.  Vi  = 

Vi 

ii3 

ru3 

Vi3 

A  = 


\  h  m 
10  1  h 
100  1 


C  =[100], 


.in  1  assume  that 


rV 

<2i* 

*1 

II 

**/  . 
Ti3 

u 

Oi2 

CiJ 

& 

li 

3 

*i3 

where  r;  are  3x3  submatrices,  Ql  are  3x3  nonnegative  definite  symmetric  sttbnia 
trices,  and  h'l  are  3x3  positive  definite  symmetric  submatrices,  lor  i  -  1,  2,  3,  then, 
system  (3)  can  be  split  into  three  subsystems; 


»i  n  =  +  mi 

vi  =Cx]  +  r )l  , 


*  =  1.  2,  3. 


Hence,  for  our  radar  tracking  problem,  it  is  sufficient  to  study  the  following  system: 


**4i  *  ***  +  r*& 
v*  =  C  x*  +  Tfc 


(A) 


where,  for  each  k,  x*  and  arc  3-vectors,  vt  and  7^  are  scalars, 
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A 


1  h  m 
0  1  h 
0  0  1 


,  m  =  h-/2  , 


c  =  [i  oo]. 


and  K.  is  a  scalar. 


Now,  suppose  that  the  appropriate  statistical  properties  of  the  noise  sequences 
Ux }»  in* }  and  the  initial  state  x«  are  known.  More  precisely,  let 


E((k)  =  0,  )  =  0. 


skJ,  E((k  r)p  =  o, 

E  /")  =  0,  £(xoV)  -  0. 

where  Qt  and  Rk  as  well  as  ECxJ  and  VaK»o)  are  given.  Then  the  Kalman  filter  can 
be  described  by  the  following  recursive  formulae  (  cf„  for  example,  Anderson  and 
Moore  [l]  or  Chui  and  Chen  [3]  ): 


x*  U  ~  X*  A*  -l)  +  Gk  (v4  ~  C  Xk  m  -1)  ) 

Xo/o  ~  E  (x o) , 

(5.1) 

x4  Hi  -1)  c  A  X(|  -D/u  , 

(5.2) 

Gt  «  P i  x  -\CT(CPk  j,  _,C T  +Rkrl, 

(5.3) 

pkjt-i  =  ap k  -u  _|/t r  +  r4_iQi  -ir4r-i , 

(5.4) 

pt*-i-rkJi  .xcTicpk  j  _,cr  +  *4  rlcpk  4 , 

!  Vor(xo) . 

(5.5) 

In  tracking  the  azimuthal  and  elevational  angles,  that  is,  when 


A At 

4 Ek 

*i  u  ~ 

A  Ak 

and  x4  u  = 

A£t 

A  At 

A Ek 

Xj  II  ~ 


961 


respectively,  are  obtained,  it  is  advisable  to  use  the  filtered  outputs 


Ai  =  +  A.4; 


and 


Et  =  et  +  A£i 

to  give  meaningful  tracking  images.  Of  course,  at  and  et  are 
and  vertical  angles  of  the  scan  axis  as  shown  in  Fig.3. 


simply  the  horizontal 
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3.  Adaptive  Kalman  Filtering 

The  statistical  properties  of  the  noise  sequences  },  {r), }  and  the  initial  state  x,, 
in  the  Kalman  filter  discussed  above  are  assumed  to  be  given  before  the  process  is  per¬ 
formed.  In  practice,  however,  these  statistical  properties  are  usually  unknown  and 
even  unpredictable.  Hence,  adaptive  filtering  is  essential.  This  means  that  we  must 
estimate  the  statistical  properties  at  each  stage  so  that  Kalman  filtering  can  be  per¬ 
formed  using  these  estimates.  In  this  note,  we  assume  that  the  initial  statistical  pro¬ 
perties  E(xo),  VaKxo),  Q<»  and  R0  of  the  state  and  noise  sequences  are  known  so  that 
the  filtering  process  can  get  started.  The  adaptive  filter  associated  with  system  (3) 
can  be  described  by 

x*  =  A  x*  +  G*  (v*  -  CA  x* .,)  (f\ 

x<>  =  £(xo).  W 

where  Gt  is  a  real-time  estimate  of  the  gain  matrix  Gt  at  the  *th  instance,  in  the 
sense  thf.t  xt  -  xt  (Gt )  satisfies 

tr  ||x.  (G. )  -  x,  |P  -  rain  <r  |£,  (G  )  -  x,  |P  ,  (7) 

c 

where,  for  any  3-dimensional  random  vector  z,  ||z||2  =  <z,z>  =Var(z).  However,  the 
computation  of  xt  defined  in  (7)  is  extremely  complicated  and  is  not  suitable  for 
real-time  rroblems  (  cf.  Chui  and  Chen  [3]  ).  Instead,  we  will  adopt  the  following 
slightly  weaker  optimality  criterion: 

tr  ||x4  (G4 )  -  x*  _3||2  =  min  tr  ||x*  (G )  -  X*  _3||2  .  (8) 

G 

It  should  be  remarked  that  estimation  of  x*_,  by  xt  has  been  studied  in  a 
different  situation.  For  instance,  estimation  of  x*  _j  by  x*  was  done  in  Jazwinski  [4] 
in  non-adaptive  Kalman  filtering  with  colored  input 

To  see  that  (f)  can  be  used  instead  of  17)  without  too  much  loss  in  the  order  of 
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estimation,  we  have  the  following 

LEMMA  1  The  error  variance  ||xi  -  x,  ||2  is  equivalent  to  Ijx*  -  xt  _3|p  in  the 
sense  that 

4 II-'-  -  x*  IP  -Cl  <  ||x,  -  X;  _,|p  ^  2||Xi  -  X;  IP  +  2 !C,  , 

where  the  constant  CL  =  ||x*  -  xk  _3||2  depends  only  on  k. 

Next,  using  the  notation 


C 

1  0  0 

v* -3 

n 

i. 

It 

CA 

=  1  h  m  ,  m  =  h  2/2  ,  «nd  vk  _w  _3  = 

Vi  -2 

CA2 

1  2 h  4m 

v*-i 

we  have  the  following 
LEMMA  2 

||G*  ( v*  -  CA  x*  _,)  -  (Nat*t  -u  -3  ~  A  xt  _,)||2  =  ||x*  -  x*  _j||2  +  Dt  , 

where  A  is  a  constant  symmetric  matrix  depending  only  on  k  . 

In  view  of  Lemmas  1  and  2,  instead  of  using  the  criterion  (7),  we  will  consider 
the  minimization  problem: 

min  tr  ||  Gk  (v*  -  CA  xt  _,)  -  (NCa*i  -u  -3  -  A xL  _,)  |p  .  (9) 

which  is  equivalent  to  (8).  Under  this  criterion,  we  have  the  following  result. 

THEOREM  1  Let  Gk  be  a  solution  of  the  minimization  problem  (9).  Then  Gt 
is  uniquely  determined  by 

G*  =  [V«r(v*  -  CA  x*  -j)!-1  Nc~j}E  l  (▼*  _w  _3  -NCAAxt  _,)  (v*  -  CAxk  (10) 

To  give  a  recursive  algorithm  for  computing  Gk ,  the  following  lemma  is  neces¬ 
sary. 
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LEMMA  3  The  estimate  GL  of  G*  in  ( 10)  can  be  rewritten  as 


G i  - 


CP i  j.  -,C  +  Rk  _j 


l Ji-u  +  NfjMi-u1Cr  , 


where 


Jt-xi  =£((x1-3-Axjt_J)(x1  -Ax*.,)7"), 


k -3 


=  E  (  Cf* -361-3  +  7j*_; 


(x*  -  Ail  - 1 ¥ )  . 


and 


Pm- 1  =  IN*  -  Ax*_,||2  . 

Using  this  lemma,  we  can  derive  the  following  recursive  computational  scheme 

Gt  -  -r— — U*-3,t  +  NaiMt-n ]  CT  , 
n*-i 

-  Var(xo)  CT 

Go  ~  •  , 

C  Var  (xo)  C  T  +  R0 

where 

Pi  x  -1  =  Fk  -1 P t  -u  -2P1-1  +  I"*  -1  Qi  -iLf -1  +  AGk  _,**  _,G4r_jAr 

P ifl  *  A  Vor(xo)  At  +  roGor0r  . 

h  -V  =  AJk  -4jt  -iPkr-i  +  AGt  _,[/»*  -,G*r_,  -  CPt  -u  _2JA  r  +  r*  ^Qk  , 

k  >4, 

Jt-3jt  =  AJ 1  —4j  _j F[ _i  +  AGt  — |A*  -iG/ _j  A  t  —Gk  -\CPk  _jA  T  ,  k  =  2,  3, 
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Fl-x  , 


J  »  . 


M  -2.1  = 


M  _u  = 


0  0  0 
0  0  0 
—  RoGoA7 


0  0  0 
0  0  0 
RfiW 


M. 


2.1 


P  rod0r0r 


ooo 

ooo 


+ 


-j 

(C4  r<  _3<2x  -3r*  -3  +  cr(  .2t[.2f1.x  ]  f[,2 


Fl-X 


k  >  3, 


with 


A/  = 


m2l 

m3, 


m„  m2J 
m  32  m33 


Fj  =  AU  -6,0,  j  =0,1,...,*  , 


and 


/j,  =  CP j'j-\CT  +  ,  y  =1,2 . *  . 


Here  &  and  are  determined  by  the  relation  : 


r i  Qx  F[  =  Ptji-i  —  Ft  Ptjt~iF[  —  ACt  Rt  G[Ar  , 


and 


Rl 


6[6t 


g[  -  c 


Pi,-xCT 
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We  remark  that  we  are  supposed  lo  know  the  initial  conditions  /:'(x„),  \  ar(x0).  Q„ 
and  R»  in  order  to  apply  this  algorithm.  However,  even  if  they  are  unkown,  any 
rough  prior  estimates  of  them  would  do  the  job.  As  the  process  is  being  performed, 
we  still  have  a  near-optimal  adaptive  filtering. 


4.  Systolic  Implementation 


The  near-optimal  adaptive  Kalman  filter  lor  our  radar-tracking  system  is  given 


by 


x;  =  A  x,  +  (7.  (v.  -  CA  x.  .  |) 
X*  =  E  (x.) . 


where  the  adaptive  Kalman  gain  G(  is  obtained  using  the  following  procedure  : 


Step  1  :  Start  with  £(x>).  Var(x>).  Go  =  Go.  *o  =  and 


Nft  = 


1 

-3/2 h 
l/h2 


0 

2/  h 
-2/  h  2 


0 

-l/2/i 

l/h2 


Step  2  :  Set  G„  = 


Var  (x»)  CT 


C  Var  (x«)  CT  +  R0 


Step  3  :  Compute 


£1.0  =  4  Var(x<))  Ar  +  roQ0r«  , 


J  _2.j  ~  03x3  , 


and 


M  -2.1  = 


0  0  0 
0. 0.  o 
-r<pW 


Step  4  :  Compute  hi=CP , i0Cr  +  R0 • 

Step  5  :  Set  G,  =  ^-  U  -z.,  +  NrfM  .Zl]  CT  . 
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Step  6  :  Compute  Ft  =  AU  —  G |C  ) . 
Step  7  :  Compute 


and 


R,  = 


Ci 


c  [g  i 


-c 


PuC' 


r,djrr  =  p i,o -ag,*,gM'  • 


Step  8  :  Compute 


P zi  *  F>Pi.oF\  +  r,Q,r,r  +  AGtRfiW  , 
y_u  =  Ay_2.1ff  +MG,[/i,Gf  -cp^at  , 


and 


M  -1,2 


0  0  0 
o.  0.0 

+ 

M-2. 1 

- RiGW 

croGoTo7- 

Step  9  :  Compute  h2  =  CPUCT  +  Rt . 

Step  10  :  Set  G2  =  J-  [J  _u  +  Nc~AlM.u]  CT  . 

Step  1 1  :  Compute  f3-  Ail  -G£). 

Step  12 :  Compute 


A 

*2 


*>ZiCr  . 


and 
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*  p2.i  -  f 2pZiF \  -ag2x&2a  . 


Step  13  :  Fo’-  k  >  3,  repeat  the  following. 

( 1 )  Compute 

A  i  -1  -  Ft  -,Pt  -u  +  r ,  _,r/_,  +  Ain  •'  , 

J,'u  =AJ‘^-‘r‘-<  +ag*-M,-,gL,  -cpl_u_iW  +r.^e,_r , 

with  Q_  1=0  and 


Mt.u  = 


0  0  0 
.0  0.0 
Ri  -iG[ A  T 


M 


i  -'W  -I 


lCA  !r‘  -&■  -JU  +  ca  r,  ]f/_, 


*7-, 


(2)  Compute 


hk  =C/’0-iCr  +4.,. 


(3)  Compute 


(4)  Compute 


F*  =  4(7  -G*C), 


(5)  Compute 


/?/  = 


G<r 


-C 


A  A 

Gfa 


^M  -«Cr  . 
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and 


r.  Qi  T l  -,H  -  -4G;  Rx  C[ A 7  . 

We  again  remark,  that  when 

AAt  Ah  i 

xiu  =  A  At  and  =  A  Et 

A  AL  A  Ex 

are  considered,  we  always  use  the  filtered  information 

At  —at  +  AAt 

and 

Et  s«j  +  AEt  , 

respectively. 

A  flow  chart  is  given  below. 
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We  conclude  this  note  by  discussing  parallel  processing  implementation  to  our 
adaptive  Kalman  filtering.  Three  sets  of  processors  are  simultaneously  used.  The 
number  of  operations  of  matrix-matrix  multiplications  etc.  in  each  set  of  processors  is 
listed  in  the  following  table  where  all  matrices  and  vectors  arc  3-dimensional.  Sys¬ 
tolic  arrays  can  be  used  here  to  perform  fast  parallel  operation  (  cl'.  [5, 6,8,9]  ). 


Number  of  Operations 


Type  of  Operations 

Set  1 

Set  2 

Set  A 

matrix-matrix  multiplication 

4 

A 

4 

matrix- vector  multiplication 

4 

2 

o 

vector- valor  multiplication 

1 

4 

() 

matrix  addition 

1 

i 

A 

vector  addition 

0 

1 

0 

scalar  multiplication 

1 

1 

1 

scalar  addition 

3 

1 

0 
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Flrl,,F\  -*•  r,di  rf  +  h 
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OPTIMAL  IMPULSE  -  CORRECTION  OF  A  RANDOM  LINEAR  OSCILLATOP* 

P.L.  Chow  and  J.L.  Menaldi 
Department  of  Mathematics,  Wayne  State  University 
Detroit,  Michigan  48202 

ABSTRACT .  Consider  the  impulse-correction  problem  to  minimize  the 
randomly  excited  vibrations  in  a  simple  mechanical  system.  The  system  is 
modeled  by  a  damped  linear  oscillator  under  a  white-noise  perturbation. 
By  the  dynamic  programming  approach,  a  set  of  variational 
(quasi-variational)  inequalities  are  derived  for  the  optimal  cost 
function.  Some  analytical  properties  of  the  optimal  cost  function  and  the 
optimal  control  law  are  described.  A  numerical  approximation  procedure  is 
proposed  for  computing  the  optimal  cost  functions.  It  is  an  iteration 
procedure  which  is  shown  to  be  convergent  and  stable.  Some  numerical 
results  are  given. 

I.  INTRODUCTION.  The  control  of  undesirable  vibrations  in  a 
■technical  or  electrical  system  is  a  problem  of  practical  interest.  For 
instance,  in  the  design  of  light-weight  robotic  arm,  the  vibration  of  the 
flexible  arm  in  the  presence  of  external  noise  must  be  reduced  to  an 
acceptable  level  for  satisfactory  performance.  In  its  simpliest  form,  a 
lumped  parameter  model  is  given  by  the  optimal  correction  of  a  damped 
linear  oscillator  excited  by  a  white-noise.  In  an  earlier  paper  (1],  we 
have  studied  this  kind  of  problem,  where  the  control  process  is  either 
continuous  or  with  jump  discontinuities.  Numerical  solution  of  such 
problem  was  briefly  discussed  in  [2}  and  was  described  in  detail  in  a 
subsequent  paper  [ 3 ] . 

*This  work  was  supported  by  the  ARO  contract  DAAG  29-83-K-0014. 
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From  the  practical  viewpoint,  a  continuous  or  piecewise  continuous  control 
process  is  more  difficult  to  implement.  This  has  led  us  to  Investigate 
the  possibility  of  applying  impulsive  controls  which,  under  suitable 
assumptions,  require  only  a  finite  number  of  switching  actions.  Thus  it 
becomes  much  easier  to  implement. 

The  optimal  impulse-correction  problem  was  treated  by  Gorbunov  [4], 
among  others,  by  the  minimax  principle.  In  contrast  we  shall  analyze  the 
problem  by  the  dynamic  programming  principle  and  the  associated 
quasi-variational  inequalities.  For  the  general  mathematical  techniques 
involved,  one  may  consult  the  references  [S]  and  [6]. 

This  paper  briefly  summarizes  some  preliminary  results  of  our 
investigation  into  this  subject.  Both  analytical  solution  and  the  related 
numerical  approximation  will  be  discussed. 

II.  OPTIMAL  IMPULSE-CORRECTION  PROBLEM.  We  consider  an  impulse 
control  of  the  undesirable  mechanical  vibrations  in  a  randomly  excited 
linear  oscillator  with  damping: 

(1)  .  x  +  px  +  q2x  -  r  wfc  +  vfc,  0  <  t  $  T, 

x(0)  -  xQ,  x(0)  -  yQ, 

where  p,  q  are  the  damping  and  spring  constants;  Xq,  yQ  the  initial 
position  and  velocity;  r  the  intensity  of  the  white  noise  w^,  and  is  an 
impulsive  control.  It  is  the  formal  derivative  of  the  jump  process: 

(2)  V,  -  L  H(t  -  0.), 

c  i-i  1  1 

where  H(t)  ■  1  for  t  ^  0,  0  otherwise,  being  the  Heaviside  function,  and 
is  the  correctional  impulse  applied  at  the  time  0 So  represents 

the  total  correctional  momentum  up  to  the  time  t. 

In  order  to  avoid  unnecessary  control  actions,  one  may  either  impose 
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penalty  for  excessive  control  actions  or  require  a  minimum  change  in  state 
before  applying  an  impulse  correction.  These  considerations  lead  to  the 
following  alternative  conditions: 

~(i)  the  cost  function  k(()  for  an  impulse  magnitude  (  satisfies 
(3)  k(()  £  kQ+  kj  |(|  ,  for  some  constants  kQ.kj  >  0,  for  all  (  €  R, 

(ii)  there  exist  constants  8q,8j  >  0  such  that 

|x(01+l)  -  x(0£)|  &  80  +  8j  |x(9i)|,  for  i-1,2,.... 

It  seems  plausible  to  anticipate  that  the  condition  (i)  implies  the 
condition  (ii)  for  sufficiently  small  8q,8j.  For,  if  it  costs  to  switch 

on  the  control,  one  will  wait  until  a  noticeable  change  in  state  occurs 
before  taking  another  correctional  impulse.  Since  the  condition  (i) 
causes  less  technical  difficulty,  we  assume  the  condition  (i)  holds  in 
this  paper.  The  problem  under  the  condition  .ii)  will  be  discussed 
elsewhere. 

Setting  y  -  x,  the  state  equation  (1)  can  be  rewritten  in  the 
integral  form: 

<4)  r  xt  -  *0  +  Jo  y.  d»* 

/  rfc  2  ’ 

1  yt  ■  yo  ■  Jo  <pys+  q  xs>  ds  +  vt  +  r  wt» 

where  <wt,t£0)  is  the  standard  Wiener  process  in  one  dimension,  and  the 
control  process  vt  is  given  by  (2)  subject  to  the  condition  (3),  which 
will  depend  on  <ws,s(t). 

For  each  control  policy  v  ■  <(^,0^),  let  J  denote  the  average  cost 
function  defined  by 

rT 

(5)  J(x,y,t,v)  -  Exy<Jt  f(xB,yB)  ds  +  g  (x<r,yT) 

+  E  k((.)  H  <0.-  t)>,  0*t<T,  xQR, 

i-1  1  1 
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where  f ,  g  denote  the  running  and  terminal  costs,  respectively,  and  k({) 

is  the  cost  for  impulse  control.  They  are  assumed  to  be  positive  definite 

and  bounded.  The  symbol  E  stands  for  the  conditional  expectation  with 

xy 

,  * 
xt  ■  x,xt  ■  y.  Our  goal  is  to  find  an  optimal  policy  v,  in  Lhe  form  of  a 

feedback  law,  which  minimizes  the  average  cost  J,  that  is 

A 

(6)  u( x , y , t )  •  J(x , y , t , v) 

■  inf  <J(x,y,t,v):  v). 

To  analyze  this  problem,  we  will  appeal  to  the  principle  of  dynamic 
programming  to  derive  a  set  of  (quasi)variational  inequalities. 


III.  OPTIMAL  COST  FUNCTION  AND  VARIATIONAL  INEQUALITIES.  By  the 
dynamic  programming  approach,  [6]  it  is  possible  to  derive  a  set  of 
variational  inequalities  governing  the  optimal  cost  function  u.  First, 
when  there  is  no  impulse  at  time  t,  u  must  satisfy  the  differential 
inequality : 


(7) 


"  ft  +  A  u  $  f,  0*t<T,  -  •*<x,y<«o, 

' 

u|T  -  g,  -«o<x,y,<m. 


where 

(8>  a  u  *  -  iy?  +  <‘»2x  +  p  y>  -  y  §£• 

and  u|T  ■  u( . , . ,T) . 

If  we  decide  to  produce  an  impulse  at  t  and  then  proceed  with  whatever  is 
optimal,  we  get 

(9)  u  $  Mu  for  0*t<T,  -~<x,y<», 
where 

(10)  Mu(x,y,t)  -  inf <k(()  +  u(x,y+f,t)  :  |(|<«>. 

Since  at  each  instant  one  must  decide  on  one  of  these  two  options,  one  of 
the  two  inequalities  (7)  and  (9)  must  be  an  equation.  Summing  up,  the 
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optimal  coat  function  u  ia  the  solution  of  the  following  optimality 
system: 

(ID  -  j£  +  Au  *  f, 

\  u  (  M  u,  0^t<T,  -••<x,y<«*, 

I  <-^jr  +  Au-f)  (u-Mu)  "0, 

u|T  "  g.  -**<x,y<~. 

We  note  that  both  sides  of  the  inequality  (9)  involves  the  unknown 

function  u.  This  is  known  as  a  quasi-variational  inequality,  in  contrast 

with  the  variational  inequality  (13)  below. 

To  solve  the  system  (11),  we  may  proceed  by  a  successive 

approximation  procedure.  Define  a  sequence  of  approximate  cost  functions 

<u  >  as  follows: 
n 


(12) 

and,  for  n  >  1 
(13) 


-  +  Au°  •  f,  0^t<T, 

u°lx  *  g.  “*,<x,y<*#, 


r  +  A  Un  $  f  , 

un  $  M  u""1,  0$t<T,  -~<x,y<«*, 

I  (-$£  *  A  un  -  f)  <u"-  Mu1*"1)  -  0, 
un|x  "  g.  -~<x,y<~. 


Under  some  suitable  assumptions  on  the  functions  f,  g  and  k  it  can  be 

shown  that  the  sequence  of  approximations  un  converges  pointwise  to  the 
optimal  cost  u.  In  fact  we  have  the  following  estimate 

0  *  un  (x,y,t)  -  u(x,y,t)  $  8n,  n-1,2,.... 
where  8  is  a  constant  with  0<8<1.  This  suggests  an  iterative  numerical 
method  of  solution  which  will  be  described  in  the  next  section. 
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To  indicate  the  optimal  control  law,  we  define  the  continuation  set 
C  -  <(x,y,t)  :  u(x,y,t)<(Mu)(x,y,t)> , 

and  denote  the  associated  free  boundary  of  C  by  r.  Also  we  let  (  * 

?(x,y,t)  satisfy 

(Mu)(x,y,t)  ■  k(0  +  u(x,y+$,t). 

Then,  starting  from  the  continuation  set  C,  the  system  evolves  freely  in  C 
until  it  first  reaches  the  point  (x,y)  on  the  boundary  T  at  t  m  9 «.  Then 

A 

the  first  impulse  of  the  size  (j(x,y,0j)  is  applied  to  push  the  system 

into  the  region  C.  Then  the  process  is  repeated  as  many  times  as 
necessary  over  the  finite  horizon  T.  On  the  other  hand,  if  the  initial 
state  is  not  in  C,  an  impulse  correction  should  be  made  to  bring  the  state 
into  the  region  C  and  then  coninue  the  process  as  before.  It  is  possible 
to  show  the  rule  indicated  above  will  yield  the  optimal  feedback  control. 
Therefore  the  constuction  of  the  optimal  policy  depends  on  the  solution 
u(x,y,t)  of  the  optimality  system  (11). 

III.  NUMERICAL  APPROXIMATION.  For  numerical  solution,  we  replace  the 
unbounded  x-y-t  space  by  a  rectangular  box: 

B  -  <(x,y,t)  in  R3  :  |x|  ^  a,  |y|  £  b,  O^t^T) . 

By  a  finite-difference  scheme,  we  approximate  the  variational  inequalities 
(11)  by  a  discrete  system  and  introduce  some  appropriate  conditions  on  the 
boundary  of  B.  To  this  end  we  set 

Ax  ■  y,  Ay  ■  j.  At  ■  §» 

for  some  positive  integers  I,  J  and  N.  Denote  by  Q(I,J,N)  the  set  of  mesh 
points  in  B,  i.e. 

Q(I,J,N)  -  (<xi«yj»tn)  :  ■  iAx,  yj  ■  JAy,  tR-  nAt, 

i  ■  0 ,+l , . . . ,+I ;  J«0,+1 , . . .  ,+J,  n»0, 1 , . . . ,N> . 
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The  pivotal  value  of  the  approxinate  solution  u  to  u  at  the  mesh  point 
<xi»yj*tn>  i*  given  by 

u?,j  "  “<xi»yj*tn>* 

Next  we  discretize  the  differential  operator  in  (7)  as  follows: 

ui,j  *  cl<i,'J)  u",j  +  c2<1* j)ui, j+1  ■Kc3<1» J>uit j_i 
+  c4(i, j)u”^j ^ j  +  c5(i,J)ui-l, j» 

2  <M.u )Jtj. 

where 

cx  -  (At  c0(i,J)J_1, 

c2  *  +  +  p 

C3  "  <(^)2  +  lq2l<^)  +  pJl+>'  c0(i^>* 

c4  -  (j)+  Ay/  Ax  c0(i,j), 

c5  ■  (J)”  Ay/  Ax  cQ< i ,  j)  , 

co  ■  <E>  *  l"2*^  *  "Jl  +  Ul  &• 

and 

(x]+((x]_)  ■  1  if  x>( < )0 ;  0,  otherwise. 

Note  that 

Cj+Cj+Cj+o^  ■  1,  and  ct>  0,  i-l,...,4. 

This  property  is  crucial  and  gives  rise  to  a  discrete  siaximum  principle. 
Next  we  discretize  the  quasi-var iational  inequality  (9)  as  follows: 

ujj  <  "»in  <k(ft)  +  uj  :  fi  and  |j+ft|  s*  J> 

n 

=  <M2u)i,j. 

Here  we  set 
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u  -  - 

regarded  as  a  vector  in  the  space  E  defined  by 

E  -  <u  6  R(2I+1)  X  (2J+1)  x  N+1  •  ||u||  <  ••>, 

where 


||u||  -  max  |u"  .|  . 
i.J.n 


The  discrete  version  of  the  optimality  system  (13)  can  now  be  written  as: 

(i)  u  *  Mftu,  ft  -  1,2, 

(ii)  max  <u"  ^  -  (M^u)"  -  0,  for  each  i,j,n,* 


<l“>  '  «i.j. 


(iv)  u£  -  vi,j*  for  I1!  "  1  or  IJl  “  J»  n  ■  0,1,...,N-1, 


in  which  g  ■  g(x. ,y.)  and  v<  .  is  an  apparent  boundary  value  imposed 

on  the  artifical  boundary.  The  boundary  value  v  should  be  chosen  to 
approximate  the  asymptotic  value  of  u  as  |x|,  |y|  -»  ». 

For  numerical  solution,  we  will  present  an  iteration  procedure.  Let 
Q  be  an  operator  on  E  defined  by 


min  <<Mftu)J  n  <  N,  i*  +  I,  J  *  +  J>, 

<Qu)  i , j  “]vi,J’  n<^ • 1  ■  +  I  or  J  -  +  J, 

*i,j*  n  “  N* 

We  start  with  an  initial  guess  u^0^  »  (u?*,)  which  satisfies  the 

conditions  (i),  (iii)  and  (iv)  of  the  system  (14).  Then  we  compute 


u(1>  -  0U<0>, 


For  ft  >  1 ,  we  compute  successively 


u(ft> 

Given  a  pre-assigned  error 
step  when 


•  ft  ■  2,3,.... 

e  >0,  the  iteration  terminates  at  the  m-th 
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Ilu<">  _  U<«-1>„  <  c< 

By  invoking  the  build-in  suximua  principle,  we  are  able  to  show  that 

the  eequence  of  iteatee  <uv*')  converges  aonotonically  to  the  solution  of 
the  discrete  system  (14).  Furthermore  the  numerical  procedure  is  stable. 

As  an  example,  numerical  computation  was  carried  out  for  the 
so-called  "cheap  control"  problem,  where  k  =  0.  For  some  special  values 
of  p,q,r,  nuaerical  results  are  shorn  in  Figs.  1-3.  Here  we  also  assume 
2  2 

f (x,y)  •  x  +  y  and  q  =  0 .  Fig.  1  and  Fig.  2  show  the  y  -  and  x  -  section 
curves  for  the  minimal  cost  function  u  at  n  ■  1,  respectively.  The 

continuation  set  C,  marked  by  the  letter  c's,  is  displayed  in  Fig.  3. 
Numerical  calculation  for  the  general  problem  will  be  undertaken  in  the 
near  future. 
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Fig.  1.  Minimum  Cost  Function:  y-Section  Curves 
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PULSE  -  ARRIVAL  TIME  FOR  WAVES  IN  TURBULENT  MEDIA 


P.L.  Chow  and  J.L.  Menaldi 
Department  of  Mathematics,  Wayne  State  University 
Detroit,  Michigan  48202 

ABSTRACT.  The  pulse-arrival  time  for  waves  in  turbulent  media  is  treated 
by  the  probabilistic  method.  The  wave-travel  time  is  introduced  as  a 
probabilistic  hitting  time  by  an  ideal  particle  moving  along  a 
characteristic  curve.  By  a  simple-wave  approximation,  the  mean 
travel-time  and  its  variance  are  investigated  and  found  to  satisfy 
appropriate  boundary-value  problem  for  elliptic  equations  in  the  diffusion 
limit.  As  an  example,  the  mean  travel-time  for  a  plane  wave  and  its 
variance  are  calculated  explicitly. 


I.  INTRODUCTION.  Tn  wave  propagation  through  a  turbulent  medium,  the 
statistical  problem  for  pulse-arrival  time  has  been  studied  by  many 
authors  (for  references,  see  (1J),  due  Lo  its  important  applications,  e.g. 
the  propagation  of  laser  beam  through  the  atmosphere.  In  the  engineering 
literature,  a  popular  method  of  treating  such  problems  is  the  so-called 
"temporal  moment"  method.  Specifically  we  let  u(x,t,u)  be  the  random  wave 
function  of  the  pulse,  and  define 


P<x,t) 


-Eju£x.it)r 


E|u(x, 


s) | ^ds 


where  the  initial  pulse  shape  is  u(x,0)  ■  f(x).  Here  one  considers  the 
pulse  arrival  time  t  at  x  from  the  orgin  as  a  random  variable  with  p(x,t) 
as  its  probability  density  function.  Thereby  the  m-th  moment  is 
calculated  by 
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Tm  “  Jo  ^  P<x*fc>  dt 


In  practice  the  above  quantity  is  not  experimentally  measurable  and 
difficult  to  compute.  These  obvious  shortcomings  have  suggested  the  need 
for  a  new  investigation. 

In  this  paper  we  shall  introduce  a  probabilistic  method  for  treating 
the  problem.  This  will  give  the  pulse-arrival  time  a  proper  meaning 
conceptual ly .  At  the  same  time  a  method  of  approximation  will  be  proposed 
for  computing  the  associated  moments  more  efficiently.  Some  computional 
results  will  be  provided  for  illustration. 


II.  Pl.USE-ARRI VAL  TIME .  Consider  the  propagation  of  a  pulse-wave  through 
a  turbulent  medium  in  the  free  space.  Then  the  wave  function  u(x,t,u) 
satisfies  the  random  wave  equation: 


(1) 


*  c2(x,t,w)  V2u,  t>0,  |  x  |  <oo , 

at2 


which  is  subject  to  the  initial  conditions 
(2)  u(x,0,w)  ■  f(x), 

^  u(x,0,w)  -  0. 

Here  x  «  (xj,x0,x^)  is  the  position  vector;  c(x,t,u)  the  random  local  wave 


2 

speed ;  V  the  Laplacian  operator,  and  f(x)  is  the  initial  pulse  form  which 
is  concentrated  near  the  origin.  For  the  wave  equation  (1),  the  motion  of 
wave-front  is  governed  by  the  random  Hami 1 ton-Jacobi  equation  [3): 

■  c  iv#r. 


By  the  method  of  characteristics,  the  above  yields  the  character i st ic 
pqmt  ion 


<4> 


c(xt ,t ,w)£t ,  Hq"*. 
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Physically  the 


dPt 

IT 


-  Vc(xL,t  ,u)  |ptl  ,  l>0-  p, 


where  denotes  the  unit  vector  along  the  vector  pfc . 


characteristic  curves 


rx;  y  -  xt,  t£0 

is  called  rays  emanating  from  the  point  x. 

Let  B  be  a  spatial  region  of  influence  by  rays  through  x.  We  first 
define  the  "travel-time"  from  x  to  B  by 
(5)  tv(B)  *  inf  <t>0  :  x.  in  B). 

x  r  L 

-  x 

By  definition,  Ty(B)  is  a  probabilistic  hitting  time  for  the  region  B  by 

the  random  ray  xfc.  In  terms  of  tx(B),  it  is  reasonable  to  define  the 

pulse-arrival  time  t(B)  to  the  region  B  as  a  random  variable  with  the 
conditional  probabi 1 i ty  distribution 

P<r(B)<t|  x0  -  x>  -  P(tx(B)<1>. 

The  initial  distribution  pn(x)  may  incorporate  the  pulse-shape  by,  for 
instance,  setting 


(6) 


P0(x) 


| f (y) |dy 

I 


Therefore,  from  now  on,  we  will  only  be  concerned  with  the  wave-travel 
time  tx(B),  instead  of  the  pulse-arrival  time  t(B). 


If  xt  is  a  Markov  process,  the  mathematical  theory  for  the  hitting 

time  or,  in  general,  a  stopping  time  is  a  wel 1 -developed  subject  [A].  For 
example,  if 

xt  ■  x  +  wt, 

where  wt  is  the  standard  Brownian  motion  is  space,  then  the  mean  hitting 


time 


T(x)  -  E  tx(B) 
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is  known  to  satisfy  the  Dirichlet  problem 

(  V2T(x)  -  -1  ,  x  in  B'  , 
l  T(x)  ■  0,  x  on  JB 

where  B'  is  the  exterior  region  and  3B  the  boundary  of  B.  Similarly,  for 
a  general  diffusion  pr°<'ess,  the  computation  of  the  mean  travel-time  or  a 
higher  moment  reduces  to  the  solution  of  the  appropriate  boundary  value 
problem.  Therefore,  in  order  to  employ  the  method  of  differential 
equations,  we  are  going  to  seek  a  diffusion  approximation  to  the  ray 
process  xfc  .  This  procedure  h3s  been  used  to  study  the  progressing  waves 

in  random  media  [  1  ]  . 


III.  DIFFUSION  APPROXIMATION .  Suppose  the  flutuation  of  the  wave  speed 
due  to  turbulence  is  weak  so  that 

(7)  cc(x,t,w)  *  c0+  c  £(x,t,w), 


where  r£  designates  the  depedence  of  c  on  the  small  fluctuation  amplitude 
£>0,  Cj-j  is  Lhe  constant  average  wave  speed,  and  (  is  a  random  field 

sat i sf  y ing 


<«) 


} 


F(  -  0, 

Ef (x,t)?(y,s)  «  R(x-y,t-s) 


In  view  of  (7),  the  ray  equation  (4)  can  be  rewritten  as 


(9) 


dx 


_t  -  c£(x[,t,w)pt,  x£  -  x, 
at 


-  c  v?<x  , t ,u>) i  pf  i ,  Pp  -  p. 
dt 

Since  the  right-hand  side  of  (10)  is  small,  as  a  first  approxmation,  it  is 
neglected.  Then  the  equation  (9)  becomes 


(11) 


-  lcQ+  cf(x£ ,t ,u)  J  P, 
at 


x. 


996 


Let  us  rename  x^  the  unperturbed  solution 


0 


xt  -  x  +  c0tp, 


(12) 
and  set 
(13) 

Then  in  view  of  (12)  and  (13),  the  equation  (11)  yields 

(U)  r  “  €  {  (yf  +  t,  u»)p, 

V  Ht 


*1 


E  0 

xt  -  xt. 


y'o  - 


0. 


To  obtain  a  diffusion  approximation  for  y£,  we  assume  that  the  random 

field  e(x,t,u»)  satisfies  a  strong-mixing  condition  in  space  and  time  so 
that  {(x,t,‘)  and  ((y,s,*)  become  asymptotically  independent  when  either 
|t-s|  or  | x-y |  becomes  large.  Then  one  can  invoke  a  limit  theorem  due  to 
Khasminskli  [6]  to  get  the  diffusion  approxmation.  ThaL  as,  for  small  t 


y£  -  y£ (r/t^ )  ~  y(r). 


and  large  t  with  r  =  t“t  fixed, 

05) 

where  y(r)  is  a  diffusion  process  with  known  diffusion  coefficients 
and  the  drift  a^  They  are  defined  by 


(16) 


I 


bu  ■ b 


where 


(17) 


yap 


T  T 

b  “  .j,1"  T  Jo  Jo  RlcoU-HJp.t-s]  dt  ds, 

1  «T  3  *  a 

«  *  «F  Jo  Jo  E  Ri  l‘'0<t-s>Dbt-sl  pi  dt  ds 

(x.t) 

3x’ 


j* 


with  Rs(x,t) 
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Alternatively  the  process  y(r)  satisfies  the  Ito  equation: 

3 

(18)  ,  dy^t)  “  ®^dr  +  Ej  b^  dWj(r) 

I  yi(0)  -  0,  i-1,2,3. 

Here  w^(r),  i*l,2,3,  are  independent  standard  Brownian  motions.  Returning 

to  the  original  process  x£,  from  (13),  we  get  the  diffusion 
approximation : 

(19)  x£  ~  xt  =  x°  +  y  (c2t). 

Therefore,  under  this  approxmation,  the  travel-time  rx(B)  -  TX(B)  with 

(20)  (B)  -  inf  <t>0:  x£  £  B> 

(21)  tx(B)  -  inf  <t>0  ;  xt  £  B> . 

As  mentioned  before,  Lhe  computation  of  statistical  properties  of  tx  for 
the  Markov  process  x^  is  much  easier. 


TV,.  COMPUTATION  OF  STATISTICS.  We  will  compute  Lhe  first  two  moments  of 
the  travel-time  rx(B)  as  a  diffusion  approximation.  To  this  end  let  L  be 

the  generator  of  the  diffusion  process  xt  gien  by 


(22) 


“  "  7  ^  biJ  +  <C0Pi  f  '2  ai>  $7* 


and  let  T  (x,b)  and  T2(x,B)  denote  the  first  and  second  moments  of  rx(B), 

respectively.  Then  it  has  been  shown  [7]  that  they  satisfy  the  following 
boundary-value  problems: 


(23) 


(24) 


I 

( 


L  T(x,B)  ■  -1,  for  x  in  B', 
T(x,B)  ■  0,  for  x  on  dB, 
l.  T2(x,b)  —2  T(x,B) ,  x  in  B', 

T0(x,B)  =  f),  for  x  on  3R , 
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where  B'  moans  the  exterior  and  9B  the  boundary  of  B.  Thereby  the 
computation  of  Lhe  first  two  moments  of  tx  is  reduced  to  solving  the 

respective  boundary-value  problems  (23)  and  (24).  In  fact  all  higher 
moments  can  be  obtained  in  this  way,  (see  [7]). 

For  explicit  results,  it  is  interesting  to  consider  a  special  case. 
Suppose  initially  the  wave  propagates  in  the  x^-direction,  under  the 

simple  wave  approximation,  so  that 

(25)  p  *  (1,0, n). 

Then  the  operator  I.  in  (22)  reduces  to 

(26)  L,  u‘!  bfi2  "ipT  +  (‘~o  +  at2) 

Consequently,  if  we  are  interested  in  the  travel  time  for  a  nearly  plane 
wave  from  Xj  *  to  B  ■  <Xj  £  fi2)»  tbe  e9,,at  ions  (23)  and  (24)  are 

reducible  to  problems  in  one  dimension,  which  can  be  solved  easily.  The 
results  are  given  by: 

(27)  T(M)  - 

£ 

(28)  T,(M>  -  A2  +  <TCAfi  . 

*  2(£>3 

where  &£  ■  (&2  -  fl,j)  and 

(29)  £  -  (c0  +  at2), 

(30)  a2  -  be2, 

which  may  be  interpreted  as  the  "effective"  mean  and  variance  parameters, 
respectively. 

We  note  that  £(c)  <  Cq  for  c  >  0.  Physically  the  drift  a,  in  view  of 

(17),  should  be  negative  as  the  correlation  R  is  a  decreasing  function  ot 
x  so  that  the  mean  travel-time  increases  due  to  random  scattering.  From 
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(27)  and  (28)  we  obtain  the  standard  deviation  of  the  travel-time: 

(31)  a  (Afl)  - 

T  /2(c)3'2 

1/2 

which  is  proportional  to  the  correlation  length  b  e  and  the  square-root 

of  distance  (Aft)*^,  and  inversely,  to  (c)3^2. 

As  a  numerical  example,  we  set  Cq*  1  and 

e"at 

(32)  R(r  ,  t )  -  (£ — *),  a>0. 

1+r 

Some  numerical  results  are  displayed  in  four  graphs,  Figs.  1-4  in  the 
following  page.  In  Fig.  1,  the  mean  travel-time  T  is  plotted  against  the 
distance  x  ■  Afi  for  a  M  f* .  7  and  c  ■  0,  0.2  and  O.S,  respectively,  while  a 
is  changed  to  1.0  in  Fig.  2.  They  showed  clearly  the  decrease  of  T  as  c 
increases.  In  Figs  3  and  4,  the  three  lines  correspond  to  the  mean 
travel-time  T  and  its  deviations  (T+or^)  v.s.  x  ■AH  with  e  »0.5  fixed,  for 

C  *  0.7  and  1.0,  respectively.  These  results  show  an  increase  of  g T  with 

the  distance  but  a  decrease  of  g^  as  the  decay  exponent  a  increases.  Even 

though  these  results  are  obtained  for  a  special  case,  the  qualitative 
feature  may  hold  in  general.  For  details,  one  is  referred  to  the  paper 
[71. 
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THE  TRANSITION  FROM  PHASE  LOCKING  TO  DRIFT 
IN  A  SYSTEM  OF  TWO  WEAKLY  COUPLED  VAN  DER  POL  OSCILLATORS** 


Tapesh  Chakraborty  and  Richard  H.  Rand 
Department  of  Theoretical  and  Applied  Mechanics 
Cornell  University,  Ithaca  NY  14853 


ABSTRACT 

We  Investigate  the  slow  flow  resulting  from  the 
application  of  the  two  variable  expansion  perturbation  method 
to  a  system  of  two  linearly  coupled  van  der  Pol  oscillators. 
The  slow  flow  consists  of  three  nonlinear  coupled  ode's  on  the 
amplitudes  and  phase  difference  of  the  oscillators.  We  obtain 
regions  in  parameter  space  which  correspond  to  phase  locking, 
phase  entrainment  and  phase  drift  of  the  coupled  oscillators. 
In  the  slow  flow,  these  states  correspond  respectively  to  a 
stable  equilibrium,  a  stable  limit  cycle  and  a  stable 
llbration  orbit. 

Phase  entrainment,  in  which  the  phase  difference  between 
the  oscillators  varies  periodically,  is  seen  as  an 
Intermediate  state  between  phase  locking  and  phase  drift.  In 
the  slow  flow,  the  transitions  between  these  states  are  shown 
to  be  associated  with  Hopf  and  saddle-connection  bifurcations. 

INTRODUCTION 


In  this  work  we  shall  be  concerned  with  the  behavior  of 
two  coupled  oscillators.  We  begin  by  introducing  some 
terminology.  We  suppose  that  the  outputs  of  the  two 
oscillators  are  of  the  form 

(1.1)  Xj ( t)  =  Rj ( t )  cos( t  -  OjC t)) 

(1.2)  Xgft)  =  R^t)  cos(t  -  62(t)) 


in  which  Rj(t)  represents  amplitude  modulation  and  0j(t) 

represents  frequency  or  phase  modulation.  We  shall  define  the 
terms  phase  locking,  phase  drift  and  phase  entrainment  for  two 
functions  of  the  form  (1)’  Although  these  definitions  can  be 
generalized  to  apply  to  a  wider  class  of  functions  than  (1), 
we  shall  restrict  our  attention  to  such  functions  since  the 
approximate  solutions  which  we  are  interested  in  this  work 
will  have  this  form. 

We  define  the  phase  difference  <p( t)  as 

(2)  *(t)  =  0j(t)  -  e2(t) 


*  This  work  was  partially  supported  by  the  grants  NSF  85-09481 
and  AFOSR  84-005 1C. 
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Then  the  functions  (1)  will  be  said  to  be  1:1  phase  locked  if 
<p(t)  is  constant.  If.  on  the  other  hand,  the  oscillators  are 
running  at  unequal  average  frequencies,  then  »p{ t )  will  grow 
unbounded,  defining  the  condition  of  1 • 1  phase  drift.  An 
intermediate  situation  exists  when  <p( t)  varies  periodically ,  a 
condition  which  we  shall  call  1 : 1  phase  entrainment  [1]. 

TWO  WEAKLY  COUPLED  VAN  PER  POL  OSCILLATORS 

We  shall  be  interested  in  the  following  system  of  two 
coupled  van  der  Pol  oscillators 


(3.1) 


2,  *1 


X1  “  fc  (1_xl  }  d r=  ea  (x2~  xl} 


(3.2) 


2X  <*2 


+  (1  +  e  A)  x2  -  e  (l-x2  )  ^-  =  e  a  (Xj-  xg) 


Here  e  «  1  and  A  and  a  are  parameters.  A  is  related  to  the 
(small)  difference  in  linearized  frequencies,  and  a  represents 
the  strength  of  the  coupling.  In  a  previous  work  [2],  the  two 
variable  expansion  perturbation  method  was  utilized  to  obtain 
an  approximate  solution  to  eqs.(3),  valid  to  order  e: 

(4.1)  xx  =  R^tj)  cos( t  -  61(r}))  +  0(e) 

(4.2)  Xg  *  R2(tj)  cos(t  -  02(q))  +  0(e) 

where  the  slow  time  variable  tj  is  given  by 
(5)  tj  =  e  t 

and  where  the  amplitudes  Rj  and  R^  and  the  phase  angle 
ip  =  0j-  02  are  given  by  the  slow  flow  on  the  space 

M:R+x  R+  x  S1 


(6.1)  2  gjji  =  -Rj  [-J-  -  1J  +  a  K,  sin  * 


(6.2) 


=  -Ro  IT-  -  1  -  o  R.  sin  <#> 


(6.3) 


d£  .  2 

,  =  A  +  a  s - 


R2  R1 


cos  <9 
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Eqs.  (6)  are  invariant  under  the  transformation 
(7)  Rf>  ^  •  V  Ri  •  *  -  ♦  "  * 

and  hence  possess  the  corresponding  symmetry.  Thus  if  there 

o  o  o 

is  an  equilibrium  point  at  (R^,  R^.  f  ).  then  there  also  one 
o  o  o 

at  (Rg.  Rj.  ^  -  ir).  In  order  to  simplify  the  following 

discussion,  we  shall  only  talk  about  half  of  the  system,  i.e., 
if  we  say  that  the  system  (6)  contains  an  equilibrium  point 
(or  a  periodic  orbit),  then  it  actually  contains  two 
equilibria  (or  periodic  orbits),  the  other  one  being  located 
at  the  symmetrical  position  in  M  under  the  transformation  (7). 

RESULTS 

Our  goal  is  to  classify  the  various  qualitatively 
distinct  behaviors  of  the  system  (6)  as  we  change  the  values 
of  the  parameters  a  and  A.  We  summarize  here  the  results 
obtained  in  [3]. 

The  system  (6)  possesses  two  other  symmetries  based  on 
invariance  under  the  transformations 

(8.1)  a  -*  -a,  f -  i.  and 

(8.2)  A  -*  -A,  ^  -*  w  -  v 

Thus  it  turns  out  that  the  qualitative  behavior  is  invariant 
under  both  a  -*  -o  and  A  -»  -A.  and  we  present  our  results  in 
2  2 

the  a  -A  parameter  plane,  see  Flg.l. 

The  system  (6)  contains  3  equilibria  in  region  R  in 
Fig.l,  and  one  equilibrium  elsewhere.  Region  R  can  be  shown 
[3]  to  be  bounded  by  curves  having  the  equation: 

(9)  A6  +  (6a2  4  2)  A4  +  (12a4  -  10a2  +  1)  A2  +  8a6  -  a4  =  0 

In  obtaining  (9)  as  well  as  many  of  the  other  results  in  this 
work,  we  used  the  computer  algebra  system  MACSYMA  [4]. 

The  presence  of  limit  cycles  in  (6)  may  be  Investigated 
by  looking  for  Hopf  bifurcations,  i.e.  by  linearizing  in  the 
neighborhood  of  each  of  the  equilibria,  and  requiring  that 
there  exist  a  pair  of  pure  imaginary  eigenvalues.  This  leads 
to  the  curve  H  in  Fig.l,  which  can  be  shown  [3]  to  have  the 
equation: 
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Bifurcation  diagram  for  the  slow  flow  (6): 

E  =  Region  containing  a  single  stable  equilibrium  point 
L  =  Region  containing  a  stable  limit  cycle 
D  =  Region  containing  a  stable  llbration  orbit 
R  =  Region  containing  3  equilibria 
H  =  Curve  of  Hopf  bifurcations 
S  =  Curve  of  saddle-connection  bifurcations 


(10)  49  A8  +  (266  a2  +  238)  A6  +  (88  a4  +  758  a2  +  345)  A4 

+  (-1056  a6  +  1099  a4  +  892  a2  +  172)  A2 

+  (-1152  a8  -  2740  a6  -  876  a4  +16)  =  0 

We  showed  that  stable  limit  cycles  occur  In  the  region 
lying  above  curve  H  In  Fig.l  by  using  center  manifold  theory 
and  normal  forms  [3]. 

The  region  L  of  limit  cycles  Is  bounded  on  the  other  side 
by  curve  S,  which  we  shall  show  later  in  this  paper  to  involve 
saddle-connection  bifurcations.  As  one  crosses  curve  S,  the 
limit  cycle  becomes  a  libratlon  orbit,  i.e.  a  closed 
trajectory  in  M,  along  which  i p  increases  without  bound.  The 
nature  of  this  bifurcation  will  be  discussed  in  greater  detail 
in  the  rest  of  this  paper. 

The  region  D  of  libratlon  orbits  is  bounded  by  the  curve 
(9)  defining  region  R.  As  one  crosses  from  region  D  into 
region  R.  a  pair  of  equilibria  are  born  on  the  libratlon 
orbit,  which  becomes  a  nonperiodic  saddle-connection. 

Fig.l  describes  the  behavior  of  the  slow  flow  (6).  In 
terms  of  the  original  eqs.(3),  region  E  corresponds  to  1:1 
phase  locking,  region  L  to  1:1  phase  entrainment,  and  region  D 
to  1:1  phase  drift.  Thus  as  one  moves  in  the  parameter  space 
along  the  dashed  line  (corresponding  to  holding  the  coupling 
strength  a  fixed  while  Increasing  the  frequency  difference  A), 
the  system  (3)  passes  from  phase  locking  to  phase  entrainment 
and  then  to  drift. 

THE  Bfn^.lNMHfTrDRIIXJIB/R^TJ^ 

In  order  to  better  understand  the  nature  of  the 
bifurcation  which  results  as  one  crosses  from  region  L  to 
region  D  in  Fig.l,  we  display  the  results  of  some  numerical 
integrations  of  the  system  (6).  Figs. 2-7  show  these  results 
in  a  portion  of  the  phase  space  N  in  which 

(11)  0  *  Rj  <  4.  0  $  Rg  *  4.  -f  Wit 

2  2 

In  each  of  Figs. 2-7,  a  =  0.25,  while  A  varies  from  0.15  in 
Fig. 2  (in  region  E)  to  0.21  in  Fig. 7  (in  region  D).  Thus  this 
sequence  of  views  corresponds  to  moving  along  the  dashed  line 
in  Fig.l.  Fig. 2  shows  the  asymptotic  approach  to  steady  state 
equilibrium,  while  Figs. 3-7  show  only  the  steady  state 
periodic  motions.  Between  Figs. 2  and  3  a  Hopf  bifurcation  has 
occurred  corresponding  to  passage  across  curve  H  in  Fig.l. 
Figs. 3-6  show  the  gradual  Increase  in  size  of  the  limit  cycle, 
and  the  curve  S  in  Fig.l  is  crossed  between  Figs. 6  and  7. 
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Fig. 4  Limit  cycle  obtained  by  numerical  integration  of  eqs.(6) 
for  a2=0 . 25 .  A2=0.17.  See  Fig. 2  for  labeling  of  axes. 
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Fig  62  Limit  C2Cle  obtaIned  by  numerical  integration  of  eqs.(6) 
for  a  =0.25.  A  =0.20.  See  Fig. 2  for  labeling  of  axes. 


Libration  orbit  obtained  by  numerical 

integration  of  eqs.(6)  for  a2=0.25.  A2=0.21. 
Note  that  this  orbit  is  closed  since  the 
planes  *  =  w  and  *  =  -w  are  identified  with 
each  other.  See  Fig. 2  for  labeling  of  axes. 


We  note  from  the  numerical  integrations  that  the 
transition  between  the  limit  cycle  in  Fig. 6  and  the  libration 
orbit  in  Fig. 7  involves  a  drastic  change  in  the  neighborhood 
of  the  surface  Rg=  0.  but  little  change  in  the  steady  state 

motions  elsewhere.  In  Fig. 6,  the  portion  of  the  limit  cycle 
near  Rg=  0  is  a  rapid  motion  at  nearly  constant  i.e.  a 

jump  down  in  <p.  In  Fig. 7,  on  the  contrary,  the  libration 
orbit  jumps  up  in  <p  near  ^  0* 

This  observation  leads  us  to  examine  the  behavior  of  the 
system  (6)  when  Rg=  0.  Since 

the  system  (6),  we  change  independent  variables  from  tj  to  t, 
where 


Rg=  0  is  a  singular  surface  for 


(12) 


This  transformation  "blows  up"  the  singularity  at  Rg=  0  ([5], 

§7.2),  and  while  it  reparametrizes  the  motion  along  the 
trajectories  in  M.  it  does  not  change  the  shape  of  the 
trajectories.  Under  (12),  the  system  (6)  becomes: 


2 

+  a  Rg  sin  <p 


a  Rj  Rg  sin  <p 


(13.3)  2  ^  =  ARj  +  a 


_  _  D 

Rj  1 


cos 


(13.1) 


dR, 


*r  =  -R!R2 


-  1 


(13.2) 


dR2 


2  *r  =  -r2 


- 1 


We  note  that  the  surface  Rg=  0  is  an  invariant  manifold  of 
(13),  which  becomes  when  Rg=  0: 


(MO 

(14.2) 

(14.3) 


dRj 

dT 

^2 

dT 


2d£ 

dT 


=  0 


=  0 


-  a  Rj  cos  f 
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Eqs.(14)  show  that  remains  constant  in  time  when 
R^=  0.  and  that  only  changes.  Eq.(14.3)  has  the  general 
solution 

(15)  tan[f  +  |]  =  C  V/2 

where  C  is  an  arbitrary  constant.  Fig.  8  shows  the  flow 
(14.3)  on  the  <p-circle  (since  and  Rg  are  constant).  Both 

Fig. 8  and  eq.(15)  show  that  as  t  ■*  +»,  <p  -»  -w/2,  while  as 
t  -*  -»  +tt/2.  This  represents  the  previously  referred  to 

jump  in  <p,  see  Fig. 9. 

Thus  the  system  (13)  has  two  lines  of  nonisolated 
equilibria  at  R2=  0,  <p  =+ir/2  and  at  R2=  0,  <p  =-ir/2,  shown  as 

dashed  lines  in  Fig. 9.  In  order  to  better  understand  the 
bifurcation,  we  consider  the  nature  of  these  equilibria. 

We  linearize  eqs.(13)  about  the  equilibrium 

(16)  Rj  =  R°  =  any,  R2  =  0.  <f  =  +w/2 


and  obtain  the  variational  equations: 

.2 


(17.1) 

(17.2) 


2  4—  6R,  =  -R? 

dr  1  1 


R 


1 


-  1 


6R„ 


2  57  6R2  =  ♦  a  R1  6R2 

d 


(17.3)  2  jj:  5<*>  =  A  5R2  +  a  Rj  69 

which  have  the  general  solution: 


(18) 


’fiRj' 

V 

o' 

6R2 

=  kj 

0 

+ 

0 

P'P  . 

0 

1. 

+Rj<x  t/2 


+4  a 
-2  A/R, 


-Rja  t/2 
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Fig. 8.  Flow  on  the  <p-circle  given  by  eq.(14.3).  Arrows  show  the 
direction  of  the  flow.  Dots  represent  the  equilibria  <p  =  -  5-  . 
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where  the  are  arbitrary  constants.  The  (IOC)  eigenvector 

and  accompany i rig  zero  eigenvalue  are  due  to  the  line  of 

nonisolated  equilibria  and  their  neutral  stability.  The 

(0  0  1)  eigenvector  corresponds  to  the  flow  (15)  in  the  Rg  =  0 

plane.  The  motion  associated  with  the  third  eigenvector 
permits  approach  to  (or  exit  from)  the  equilibrium  (16)  from 
(or  to)  the  rest  of  the  space  M.  Since  the  equilibria  are 
nonisolated,  these  elgendirections  form  a  2-manifold  in  M.  see 
Fig.  10.  These  surfaces  form  a  stable  manifold  for  the  line  of 
equilibria  =  ir/2  and  an  unstable  manifold  for  the  line  of 
equilibria  <p  =  -ir/2. 

We  return  now  to  the  question  of  the  bifurcation  from  the 
limit  cycle  of  Fig. 6  to  the  libratlon  orbit  of  Fig. 7.  In 
Fig.  11  we  superimpose  Figs. 6  and  7  on  the  elgenmani folds  of 
Fig.  10.  Fig.  11  shows  that  the  bifurcation  involves  a 
saddle-connection  orbit. 

The  bifurcation  may  be  further  understood  by  projecting 
the  limit  cycle  and  libration  motions  on  the  <p-rj  cylinder,  see 
Fig. 12. 

CONCLUSIONS 


It  is  interesting  to  compare  the  results  of  this  study  of 
weakly  coupled  van  der  Pol  oscillators  with  those  of  an 
earlier  study  [6]  of  strongly  coupled  van  der  Pol  oscillators. 
In  [6]  the  following  system  was  analyzed  by  using 
perturbations  to  0(e): 


(19.1) 


,2 

d  x. 


dt 


2X  ^1 


X1  "  fe  H-*!  >  dT=  a  <*2-  xl> 


(19.2) 


.2  . 

d  x9  0  dx0 

— T  +  (1  ♦  A)  Xjj  -  e  (l-Xg2)  =  a  (xr  Xg) 


Note  that  eqs.(19)  are  of  the  same  form  as  eqs.(3)  with  e.  a 
and  e  A  replaced  by  a  and  A  respectively.  The  slow  flow 
resulting  from  the  application  of  the  two  variable  expansion 
method  to  eqs.(19)  has  been  shown  [6]  to  exhibit  phase  locking 

iff  A2  <  2a2.  If  A2  >  2a2,  the  system  (19)  performs  a 
quasiper iodic  motion  which  corresponds  to  phase  drift,  see 
Fig. 13. 

For  large  values  of  a  and  A,  the  Hopf  curve  (10)  in  the 
weakly  coupled  case  may  be  written  (keeping  highest  order 
terms  only): 

(20)  49  A8  +  266  a2  A6  +  88  a4  A4  -1056  a6  A2  -1152  a8  =  0 
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Fig.  10  Eigendirectlons  for  the  equilibria 
(16).  These  surfaces  form  a  stable  manifold 
for  the  line  of  equilibria  <p  =  ir/2  and  an 
unstable  manifold  for  the  line  of  equilibria 
<p  *  -ir/2.  See  Fig. 2  for  labeling  of  axes. 
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Fig.  12.  Projection  of  the  limit  cycle  of  Fig. 6  and  the  libration 
orbit  of  Fig. 7  onto  the  <p-q  cylinder.  The  top  and  bottom  of  the 
cylinder  represent  <p  =  tt/2  and  <p  =  -ir/2,  respectively.  The  limit 
cycle  (corresponding  to  phase  entrainment)  lies  entirely  on  the 
front  face  of  the  cylinder,  while  the  libration  motion 
(corresponding  to  drift)  winds  completely  around  the  cylinder. 

In  the  case  of  the  phase  entrained  motion,  <p(ri )  is  periodic, 
while  for  the  drift  case,  <#>(17)  becomes  unbounded  as  17  goes  to 
inf inity. 
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°  of  5  °  <*2  5 

WEAK  COUPLING  STRONG  COUPLING 


Fig.  13.  Comparison  of  the  dynamics  of  the  weakly  coupled  system 
(3)  studied  in  this  work  with  the  strongly  coupled  system  (19) 
studied  in  [6].  The  shaded  region  represents  phase  entrainment. 
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which  may  be  factored  to  give 

(21)  (A2  -  2  a2)  (A2  +  4  a2)  (7  A2  +  12  a2)2  =  0 

2  2 

Thus  the  Hopf  curve  (10)  approaches  A  =  2a  as  A  and  a 
approach  infinity,  in  agreement  with  the  strongly  coupled  case 
(19).  Numerical  simulation  of  the  slow  flow  system  (6) 
reveals  that  the  saddle  connection  curve  S  of  Fig.l  also 
approaches  this  limit.  See  Fig.  13.  in  which  the  behavior  of 
systems  (3)  and  (19)  is  compared. 

In  conclusion,  we  see  that  the  approximate  solution  for 
the  weakly  coupled  case  studied  in  this  paper  exhibits  a 
gradual  transition  from  phase  locking  to  phase  drift, 
separated  by  an  intermediate  region  of  phase  entrainment.  The 
comparable  transition  in  the  strongly  coupled  case  is  shaip. 

As  noted  in  [3]  and  [6],  numerical  simulations  of  the  original 
systems  (3)  and  (19)  show  that  the  actual  transition  is  in 
fact  gradual. 
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ABSTRACT .  The  design  of  a  digital  controller  for  weapon  pointing 
and  stabilization  requires  an  accurrate  plant  and  disturbance  model  in 
order  to  achieve  optimal  performance.  However,  these  models  are  often 
unknown.  Moreover,  the  disturbance  and  plant  dynamics  (such  as 
helicopter  turret  motion)  in  weapon  pointing  systems  are  often  time 
varying.  Adaptive  control,  which  permits  on-line  parameter 
identification  update,  can  improve  weapon  pointing  accuracy  under 
these  conditions.  An  adaptive  weapon  control  module  has  been  developed 
to  perform  minimum  variance  self-tuning  control  for  advanced  armament 
system  applications.  A  weighted  control  scheme  is  used  to  obtain  a 
stable  control  law  for  non-minimum  phase  plants  which  frequently  arise 
in  digital  control  implementations.  The  adaptive  control  module 
consists  of  parallel  Intel  286/287  and  86/87  processors  for 
identification  and  control  updates.  Preliminary  test  results  using  a 
laboratory  test  fixture  will  be  discussed  which  demonstrate  the 
performance  capabilities  of  the  system.  Final  evaluation  of  the 
performance  of  the  adaptive  weapon  control  module  will  b»'  carried  out 
in  FY86  using  a  30  mm  automatic  cannon  mounted  on  a  Cobra  aircraft. 


INTRODUCTION .  Multivariable  adaptive  control  [1J  can 
significantly  Improve  the  performance  of  control  systems  where  the 
plant  or  the  disturbance  model  is  not  completely  known.  In  particular, 
to  achieve  high-'  performance  in  gun  pointing  control  system  for 
aircraft/weapon  application  where  the  disturbance  model  and  aircraft 
dynamics  are  time  varying,  adaptive  control  may  render  greater 
pointing  accuracy  over  broader  performance  envelopes. 

To  design  a  robust,  high  bandwidth  digital  weapon  controller 
requires  precise  knowledge  and  analytical  models  of  both  high  and  low 
frequency  system  dynamics,  including  the  platform  disturbance 
environment.  Typically,  the  high  frequency  system  dynamics  is  not 
known  a  priori,  or  in  many  cases,  the  relevant  model  parameters  will 
exhibit  time  varying  behavior,  resulting  in  poor  performance,  or  even 
instability,  in  systems  with  fixed  gain  controllers.  Some  of  the  more 
commonplace  situations  include  (1)  variations  in  structural 
characteristics  due  to  changes  in  weapon  firing  angle/engagement 
geometry,  barrel  heating  and  nonlinear  effects,  (2)  variations  in  the 
disturbance  frequency  spectrum  due  to  changes  in  air/ground  speed, 
turbulence,  ground  roughness,  firing  rates,  maneuvers,  etc.  and  (3) 
variations  in  plant  dynamics  due  to  sensor /actuator  failure  or 
malfunction,  and  component  degradation.  It  is  under  these  conditions 
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that  on-line  adaptive  control  is  used  to  maintain  peak  operating 
performance . 

We  will  review  the  algorithms  for  adaptive  control  in  the  sense 
of  minimum  variance  in  the  next  section.  Then  we  will  describe  the 
adaptive  control  with  weighted  control  scheme.  Following  will  be  a 
general  description  of  the  adaptive  control  module  configuration.  Then 
the  implementation  of  the  adaptive  control  law  will  be  presented  and 
discussed. 


ALGORITHMS  OF  ADAPTIVE  CONTROL.  The  basic  structure  for  an 
adaptive  control  system  is  shown  in  Figure  1.  The  plant  represents  the 
physical  weapon  system  which  is  being  controlled  including  actuators, 
sensors,  platform,  weapon  and  disturbance  dynamics.  The  vector  Y 
consists  of  all  platform  sensor  outputs  which  are  processed  by  a 
variable  parameter  controller.  This  controller  consists  of  a  parameter 
identification  update  algorithm  which  identifies  all  model  parameters 
from  the  measurement  vector  Y  and  the  control  vector  u.  The  approach 
discussed  in  this  paper  uses  UDU  factored  recursive  least  squares 
identification  algorithm  (2,3).  The  model  parameters  are  then  passed 
to  a  control  design  algorithm  which  predicts  the  next  plant  output  and 
computes  the  control  input  u  to  the  plant  fcr  pointing  and 
stabi 1 i zat ion . 


The  plant  dynamics  of  a  typical  weapon  pointing  system  can  be 
represented  in  discrete  form  by  a  system  of  equations  expressed  in 
observer  canonical  form 


X ( j  + 1 ) =AX ( j  )  +  Bu ( j ) 
Y ( j ) =CX ( j  ) 


where 


A 


(1) 

(2) 


(3) 


B  =  |b( . b^r 

C  -  (1  0 . 0) 
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and  X ( 3 )  is  the  current  state  vector  of  the  system,  Y(j)  the 
measurement  and  n  the  number  of  system  states.  To  identify  the  system 
model  parameters  as  and  b8  one  may  rewrite  the  measurement  equation  as 


Y  ( j  )  *  ^j)8  4  e  ( j  )  (4) 

6*18!,...  .  ,a a,  b,  ,  .  .  .  .b^  f  (5) 

4>(j)  c  (-Y( j-1 ), . . . ,  -Y(  j-n),u(  j-1 ) , . . .  ,u(  j-n)  JT  (6) 

e ( j )  -  Y ( j )  -  Y(j)  (7) 


where  $(j)  is  the  estimated  measurement. 

The  parameters  can  then  be  estimated  with  the  well  established 
weighted  recursive  least  square  method  given  by  the  equations 


$(j)  =  0( j-1 )  4  L(j-l)Etj)  (8) 

E ( j  )  *  Y ( j )  -  $ j-1  )8< j-1 )  (9) 

P(  j-2)d>(j-i) 

L(j-D  *  - 77 - • - 7 -  <10> 

«( j-l)4$j-l)P(j-2)$(j-l) 


P(j-l)  -  -  IP  <  j  —  2  )  -  L(  j-ixft  j-1  )P(  j-2 ) )  (11) 

a(j-l) 


a(  j  )  * 

a0a(j-l)  +  (1-<W 

where  fe(0), 

P(-l)  >  0 

a(0)=0.95 
aQ«=0. 99 

In  the  factored  version  of  the  weighted  recursive  least  squares 
algorithm,  the  covariance  matrix  P  is  factored  as 

P  «=  UDUT  (13) 

where  U  is  a  unit  upper  triangular  matrix  and  D  is  a  diagonal  matrix. 
The  matrices  U,  D,  and  L(j)  are  updated  in  such  a  way  that  the 
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positive  definite  property  of  the  diagonal  matrix  D  is  maintained. 
This  is  essential  to  maintain  the  computational  stability  in  the 
presence  of  '  truncation  and  round  off  errors  associated  with 
microprocessor  implementation. 

A 

Given  the  estimate  8(j)  of  the  plant  parameter  vector  B(j),  one 
may  use  a  pole  placement  procedure  such  as  the  one  described  in 
reference  14]  or  one  may  compute  directly  the  minimum  variance  control 
which  can  be  shown  to  be  equivalent  to  solving  the  system  of  linear 
equations 


y*(j+d)  =  <t>T(j)e(j) 


(14) 


where  Y*  is  the  desired  reference  trajectory  and  d  is  the  delay 
between  the  input  and  plant  output. 


ADAPTIVE  CONTROL  WITH  WEIGHTED  CONTROL  SCHEME.  Minimum  variance 
control  will  generally  exist  poor  performance  when  the  physical  system 
has  unstable  zeros.  In  this  case,  it  is  often  helpful  to  use  a 
weighted  control  scheme  to  get  good  adaptive  control.  The  ARMAX  model 
representing  the  system  given  in  equation  (4)  is 


A(q“' 

)Y<  j+1  )=B(q“'  )u(  j)+C(q-' 

)V( j4l) 

(15) 

A(q-‘ 

)  =  l+a,  q' +axq1 +  . 

(16) 

B(q-' 

)=b,  +biq'  +biqi+ . 

.  .twT” 

C(qH 

)=1+C|  q 1  ♦c1q2  + . 

„  -ry. 

.  .c^q 

V(  j  +  1  ) 

is  a  white  noise  process. 

q1  is  the  time  delay  operator 

• 

Minimizing  the  weighted  control  index 

J*expectation  of  l  |Y(  j+1  )-Y^(  j+1  )|"  ♦  p2|  u(  j)|2)  (17) 

at  time  j  gives  a  control  law  that  requires 

[B(qH  )  +  — -  A(q”'  )]  to  be  stable  instead  of  Btq1  ). 

Where  is  the  weight  on  the  u(j+l). 

Hence  adaptive  control  with  weighted  control  scheme  provides  stable 
control  for  systems  that  have  unstable  zeros. 

An  alternate  form  of  implementing  the  weighted  control  scheme  is 
to  replace  Y(j+1)  by 
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Y(j-U)  »  Y(  j4l  )  4  y  u(  j)  (18) 


Y^  <j4l  )  *  Y(  j4l  )  -  V(j4l)  (19) 

in  the  usual  minimum  variance  adaptive  control  law  where r  «  p^bc  . 

Prom  equations  (IS),  (18),  and  (19) 

A(q-*)Y(j4l)  «  (B(q"‘)  4  V  A(<f» )  )u  (  j  )  4  C  (q‘‘  )V  ( j4l )  (20) 

lB(qH)4YA(qH  ))u(  j)  «  lA(q"' J-Clq"4 )  ]V(  j4l  )4A(q~‘ )Y^(  j4l )  (21) 

In  order  to  achieve  good  performance,  choose  such  that  the  poles 
of  lB(q"' )4  YA(qH )  J  must  lie  inside  the  unit  circle. 

However,  the  weighted  control  scheme  generally  creates  steady 
state  errors  in  the  system  outputs  which  are  unacceptable  for 
tracking/pointing  applications.  One  way  to  deal  with  this  steady  state 
error  problem  is  to  insert  a  frequency  shaping  at  the  input  and  the 
output  of  the  system  such  that  the  overall  control  becomes  one  of  the 
model  following  adaptive  control. 

The  frequency  shaping  N (q"1  )/D(q-'  )  changes  the  modified  output 


N(q-') 

Y  (  j+l )  -  -  Y(j4l)  4  y  R(q“*  )u(  j)  (22) 

D(qH) 


where  N(.)  ,  D(.)  ,  and  R(.)  are  monic. 

Now,  it  is  required  that  N{.)  and  D( . )  have  stable  roots  and  the 
roots  of  lN(q"<)B(q*;‘)4YR(q“*)A(q’*)D(q"' ))  are  stable.  The  YR(q‘"')  term  is 
used  to  stabilize  the  unstable  zeros  of  B(q~* ). 


ADAPTIVE  CONTROL  MODULE  CONFIGURATION.  The  parallel  architecture 
of  the  adaptive  control  processor  module  is  shown  in  Figure  2.  The 
Intel  8086/8087  contains  all  analog  to  digital  (A/D)  and  digital  to 
analog  (b/A)  circuitry  and  is  programmable  for  easy  scaling  and 
manipulation  of  data.  It  operates  as  a  fast  system  for 
controller/prediction  operation.  The  Intel  80286/80287  processor  board 
with  1  megabyte  of  RAM  serves  as  the  master  processor  and  also 
supports  floating  point  implementation  of  the  factored  weighted 
recursive  least  squares  identification  update  and  control  law  update 
algorithms.  This  processor  will  compute  control  law  updates  and 
identification  updates  every  8500  y  sec  for  a  12  state,  2  input  -  2 
output  system. 
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ADAPTIVE  CONTROL  LAW  IMPLEMENTATION.  The  controller  software  is 
developed  with  a  VAX  1 1/780  host  computer.  The  developed  program  is 
downloaded  to  the  adaptive  control  processor  module  which  will  output 
control  commands  to  a  test  fixture,  in  this  case  an  inertia  wheel. 
Angular  measurement  of  the  system  and  reference  command  are  fed  to  the 
control  module  as  inputs. 

The  test  fixture  consists  of  two  DC  torque  motors  which  drive  the 
inertia  wheel  and  a  resolver  which  measures  the  shaft  angle  of  the 
wheel.  Through  an  electronic  circuitry,  the  resolver  signal  is 
amplified  and  demodulated  before  entering  the  A/D  converter  in  the 
control  module.  The  analog  control  command  from  the  module  actuates 
the  system  through  a  torque  drive  amplifier  in  the  electronic 
circuitry. 

Accounting  for  the  effect  of  a  zero  order  hold  in  the  A/D 
converter,  the  open-loop  response  of  the  test  fixture  in  z-domain  is 

b.z-1  +  b~z-2 

Y(z)  =  - - - — L - -r  U(z)  (23  ) 

1  +  a^z  +  a2z  ^ 


The  weighted  control  adaptive  control  design  is  implemented  with 
the  inertia  wheel  test  fixture.  A  square  wave  sequence  is  fed  to  the 
system  as  a  reference  command  signal.  The  sampling  rate  and  the 
control  rate  are  set  to  100  Hz,  The  identification  update  and  control 
law  update  are  processed  at  10  Hz.  The  initial  values  of  the 
parameters  are  estimated  from  the  open-loop  data  of  the  system.  Figure 
3  shows  the  input  square  wave  command  Y/^  ,  the  control  signal  u,  and 
the  output  Y  of  the  system.  Here  the  output  approaches  the  reference 
command  with  good  stability  property.  In  this  case,  the  weighting 
factory  in  equation  (18)  is  tuned  to  -2.85.  When  the  factor  Y  is  set 
to  -5.0,  Figure  4  shows  the  steady  state  error  of  the  output  Y  from 
the  reference  signal  Y^  . 


DISCUSSION .  We  have  reviewed  the  minimum  variance  adaptive 
control  algorithm  as  well  as  the  weighted  recursive  least  square 
identification  method.  Also,  we  have  described  the  weighted  control 
scheme  to  get  around  the  non-minimum  phase  problem  and  talked  about 
the  model  following/frequency  shaping  which  will  help  us  deal  with  the 
tracking  problem.  The  basic  configuration  of  the  control  module  using 
parallel  Intel  286/287  and  86/87  processors  was  presented.  Some 
preliminary  test  results  of  using  the  weighted  control  adaptive 
control  algorithm  interfaced  with  the  laboratory  test  fixture  were 
shown  to  have  good  stability  property.  Based  on  computer  simulations 
and  the  results  of  this  implementation  study,  we  strongly  believe  that 
digital  adaptive  control  for  aircraft/weapon  pointing  applications  is 
a  practical  and  viable  option.  A  final  evaluation  of  the  performance 
of  the  adaptive  weapon  control  module  using  a  30  mm  automatic  cannon 
mounted  on  a  Cobra  aircraft  has  been  scheduled  for  FY86. 
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Figure  1 .  Basic  Structure  of  Adaptive  Control 


Figure  2.  Architecture  of  Adaptive  Controller  Module 
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Figure  3».The  Time  Response  Of  The  System  With  Weighting  Factor -2.85 
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ABSTRACT.  A  finite-difference  analysis  of  the  exterior  problem  for  Poisson’s 
equation  has  an  infinite-electrical-network  analog.  Concurrently  with  the  recent  numeri¬ 
cal  attacks  upon  that  problem,  a  theory  for  infinite  electrical  networks  has  been  develop¬ 
ing  and  provides  an  alternative  approach.  Examples  taken  from  the  flow  of  petroleum 
into  an  oil  well  and  from  well-logging  with  the  resistivity  method  illustrate  the  unusual 
kinds  of  connections  at  infinity  that  an  infinite  electrical  network  can  have.  A  technique 
is  described  for  solving  the  exterior  problem  for  a  three-dimensional  anomaly  contained 
within  a  sphere  S.  Infinite  electrical  network  theory  yields  a  driving  •  point  conductance 
matrix  Y  that  replicates  the  effect  of  a  spherical  grid  arising  from  a  finite-difference 
analysis  for  the  region  exterior  to  S.  This  allows  the  domain  of  analysis  to  be  contracted 
to  S  and  its  interior.  The  resulting  analysis  remains  exact  in  the  sense  that  the  solution 
for  the  spherical  grid  within  and  on  S  coupled  with  Y  is  precisely  the  same  is  that  for 
the  infinite  spherical  grid. 


I.  INTRODUCTION.  The  purpose  of  this  paper  is  to  briefly  survey  a  relatively 
new  area  of  research,  namely,  the  search  for  the  solutions  for  certain  kinds  of  infinite 
electrical  networks,  and  to  point  out  how  this  can  lead  to  better  computational  methods 
for  the  exterior  problem  for  Possion’s  equation.  That  problem  arises  when  the  domain  of 
interest  is  infinite  in  extent.  During  the  past  decade  or  so,  the  exterior  problem  for  the 
numerical  analysis  of  partial  differential  equations  has  been  the  subject  of  a  substantial 
amount  of  research  (Bayiiss  et  al  1982,  Canuto  et  al  1985,  Engquist  and  Majda  1977,  Fix 
and  Marin  1978,  Goldstein  1979  and  1982,  Gustafsson  and  Krriss  1979,  Hagstrom  and 

This  work  wis  supported  by  the  National  Science  Foundation  under  Grants  DMS- 8319835  and 
DMS-8S21824. 
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Keller  1984  and  in  press).  The  common  procedure  in  all  these  works  is  the  introduction 
of  an  artificial  boundary  that  renders  the  domain  finite  and  then  a  search  for  an 
appropriate  boundary  condition  on  that  boundary. 

For  Poisson’s  equation  (as  well  as  for  other  partial  differential  equations),  there  is 
another  way  to  solve  this  problem.  It  is  well  known  that  a  five-point  finite-difference 
approximation  of  Laplace's  operator  can  be  represented  by  a  rectangular  grid  of  positive 
resistors.  As  a  result,  the  exterior  problem  for  Poisson’s  equation  has  an  infinite- 
electrical-network  analog.  So,  instead  of  inserting  an  artificial  boundary,  one  might 
search  for  the  solution  to  the  infinite  network.  Concurrently  with  the  appearance  of  the 
aforementioned  literature  on  the  exterior  problem,  a  theory  for  infinite  electrical  net¬ 
works  has  been  developing.  Almost  all  of  the  earlier  work  was  directed  toward  existence 
and  uniqueness  theorems  but  more  recently  solutions  have  been  found  for  certain  infinite 
gridlike  resistive  networks  (Zemanian  1978,  1981,  1982,  1985c;  Zemanian  and  An  1985; 
Zemanian  and  Subramaniam  1983;  Zemanian  and  Zemanian  in  press).  These  are  solu¬ 
tion  for  rectangular,  cylindrical,  and  spherical  resistive  grids  of  both  the  grounded  and 
ungrounded  varieties.  It  is  required  however  that  the  grid’s  parameters  vary  with  only 
one  or  at  most  two  coordinates.  This  approach  to  the  discretized  exterior  problem  has 
also  been  applied  to  polarized  electromagnetic  waves  (Zemanian  1985a, 1985b),  but  now 
an  RLC  grid  ensues.  In  every  one  of  these  applications  an  operational  calculus  can  be 
exploited  to  obtain  a  rapid  means  of  computing  the  solution. 

In  the  next  section  we  examine  one  aspect  of  infinite-electrical-network  theory  in  its 
current  state  of  development.  That  aspect  concerns  the  variety  of  conditions  at  infinity 
that  may  arise  in  different  applications.  Infinite  network  theory  leads  naturally  to  a  con¬ 
sideration  of  these  conditions,  whereas  the  analogous  problem  does  not  seem  to  have 
been  taken  up  in  the  numerical  approaches  to  the  exterior  problem  listed  above. 

In  the  last  section,  we  consider  the  operator  V  ‘  V  spherical  coordinates  for  a 
medium  whose  parameter  a  may  vary  only  with  the  radial  and  latitudinal  coordinates  r 
and  9  but  not  with  the  logitudinal  coordinate  4>.  We  show  how  an  exact  solution  for  the 
corresponding  spherical  grid  coupled  with  an  appropriate  operational  calculus  yields  a 
fast  computational  method  for  solving  Pisson’s  equation  in  all  of  three-dimensional  space 
for  the  case  where  the  medium  is  uniform  except  in  a  bounded  three-dimensional  region. 

If.  CONDITIONS  AT  INFINITY.  Strange  things  can  happen  in  an  infinite 
electrical  network.  We  present  three  examples  to  show  how  some  of  these  peculiarities 
arise  naturally  out  of  practical  applications. 

II.l.  Example.  For  finite  domains  the  resistive-network  analogs  of  Poisson’s 
equation  have  finite  nodes,  that  is,  no  more  than  a  finite  number  of  branches  are 
incident  to  any  node.  For  infinite  domains  however  this  need  not  be  so.  Figure  1(a) 
shows  two  perfect  conductors  that  extend  out  to  infinity.  The  electrostatic  potential  in 
the  region  between  the  conductors  is  determined  by  Laplace’s  equation,  a  five-point 
discretization  of  which  yields  a  resistive  grid  whose  nodes  away  from  the  conductors  each 
have  four  incident  branches.  However,  the  nodes  of  any  conductor  are  all  shorted 
together,  and  thus  there  are  two  nodes  having  an  infinity  of  incident  branches.  This  is 
shown  in  Figure  1(b). 


1030 


Unfortunately,  KirchhofTs  current  law  need  not  hold  at  an  infinite  node  (Zemanian 
1974  page  275).  Situations  can  arise  where,  if  one  adds  all  the  incident  currents  flowing 
toward  an  infinite  node,  as  well  as  all  the  incident  currents  flowing  away  from  that  node, 
one  finds  a  total  of  one  ampere  flowing  toward  it  and  zero  amperes  flowing  away.  A  per¬ 
spective  on  this  anomaly  can  be  obtained  by  examining  a  sequence  of  networks  that 
approach  the  infinite  network  with  the  branches  that  carry  currents  away  from  the  node 
being  added  one  by  one.  What  can  happen  is  that  those  currents  individually  tend  to 
zero  while  their  sum  remains  one.  In  the  limit  each  such  branch  carries  zero  current, 
and  so  the  infinite  series  of  those  zero  values  must  equal  zero,  not  one.  Nonstandard 
analysis  may  provide  a  means  of  avoiding  this  paradox  by  allowing  the  infinite  series  of 
infinitesmal  currents  to  add  up  to  one.  However,  the  approach  used  in  our  rigorous 
theory  of  infinite  networks  is  to  allow  the  nonsatisfaction  of  KirchhofTs  current  law  at 
certain  infinite  nodes  (Zemanian  1974),  the  “nonrestraining”  nodes  (Zemanian,  1986). 

II.  2.  Example.  The  second  example  concern  the  flow  of  petroleum  into  an  oil 
well  through  a  completion  zone  of  fractured  rock  surrounded  by  porous  virgin  rock.  See 
Figure  2(a).  The  pressure  P  is  assumed  to  be  governed  by  Laplace’s  equation  v  *  V  P 
»  0  with  the  orifice  of  the  oil-well  pipe  as  a  sink  for  the  flow.  The  pipe’s  surface  is  a 
boundary  with  a  null  Neumann  condition.  Given  the  flow  into  the  orifice,  the  pressure 
variation  is  desired.  A  discretization  in  cylindrical  coordinates  yields  a  cylindrical  grid,  a 
cross-section  of  which  through  the  p’pe’s  centerline  is  shown  in  Figure  2(b).  The  encir¬ 
cled  arrows  therein  represent  flow  sinks  (i.e.  current  generators)  whose  values  are  known. 

For  finite  networks  the  extraction  of  currents  at  some  nodes  must  be  balanced  by 
the  injection  of  currents  at  other  nodes.  For  infinite  networks  this  principle  is  upheld  by 
assuming  (often  tacitly)  that  the  current  is  returned  at  infinity  to  the  network.  —  But 
where  at  infinity?  Does  it  matter?  Sometimes  it  does  and  sometimes  it  doesn’t 
(Zemanian  1986).  If  the  series  of  (positive)  resistance  values  along  a  one-ended  path 
to  infinity  converges  --  as  can  happen  if  the  rock's  permeability  increases  fast  enough  in 
some  direction  toward  infinity  —  and  if  that  series  for  another  such  path  P2  diverges  and 
moreover  P2  is  not  "shorted  at  infinity  "  to  other  such  paths  (to  be  more  precise  we 
should  refer  to  "pathlike  extremities”  (Zemanian  1986  Section  II)),  then  it  truly  does 
matter  where  at  infinity  the  returns  for  the  sinks  are  located.  This  must  be  specified  if 
the  pressure  is  to  be  uniquely  determined. 

In  general,  one  has  to  specify  which  paths  to  infinity  are  “shorted  together  at 
infinity”,  which  paths  to  infinity  are  left  “open-circuited  at  infinity”,  (Zemanian  1975), 
and  which  voltage  (i.e.,  pressure)  or  current  (i.e.,  flow)  sources  are  connected  at  infinity 
to  which  such  paths  (Zemanian  1986).  Al*  this  requires  a  thorough  reworking  of  stan¬ 
dard  electrical  network  theory  and  the  introduction  of  some  new  concepts  such  as  nodes 
and  branches  that  are  connected  to  certain  extremities  of  the  network  at  infinity.  Stan¬ 
dard  infinite-graph  theory  does  not  allow  such  entities,  and  so  that  too  must  be 
reworked. 

XL  3.  Example.  Figure  3(a)  shows  a  well-logging  tool  for  a  normal-log  resistivity 
measurement.  A  current  probe  injects  an  electrical  current  h  into  the  rock  surrounding 
a  borehole  and  that  current  is  extracted  at  a  return  probe  on  the  earth’s  surface.  A 
potential  probe  adjacent  to  the  down-hole  current  probe  measures  the  resulting  electrical 
potential  in  the  borehole.  The  probes  are  mounted  on  an  insulated  sonde  but  the  rest  of 
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the  borehole  is  filled  with  drilling  mud,  which  has  a  high  electrical  conductivity.  The 
borehole  is  assumed  to  extend  infinitely  above  as  well  as  infinitely  below  the  sonde.  A 
discretization  of  Laplace’s  equation  in  cylindrical  coordinates  yields  a  cylindrical  grid, 
whose  cross-section  on  a  plane  through  the  borehole  is  shown  in  Figure  3(b)  only  for  the 
section  to  the  right  of  the  borehole.  Even  though  the  earth’s  surface  is  taken  to  be 
infinitely  far  from  the  down-hole  probes,  we  may  have  a  situation  that  may  be  modeled 
by  a  conductivity  that  increases  rapidly  as  a  path  is  traced  upward  indefinitely.  Indeed, 
such  a  model  may  be  suitable  if  there  is  a  highly  (more  precisely,  a  perfectly)  conducting 
overburden  (e.g.,  a  salt  marsh).  For  such  a  model,  the  return  probe  of  the  current 
source  must  be  specified  as  being  connected  to  the  grid’s  “extremities  in  the  upward 
direction”  --  and  not  elsewhere  at  infinity. 

As  a  further  idealization,  we  might  take  it  that  the  drilling  mud  is  also  perfectly 
conducting.  In  this  case,  the  borehole  above  the  sonde  is  at  a  constant  potential,  the 
same  potential  as  the  overburden.  This  means  that  there  is  an  infinite  node  nw ,  which  is 
moreover  shorted  to  the  “extremities  in  the  upward  direction”.  This  is  illustrated  in 
Figure  3(c).  There  is  another  infinite  node  nt  representing  the  borehole  below  the  sonde, 
but  this  one  is  not  shorted  to  any  extremities  at  infinity. 

Still  another  complication  that  arises  with  infinite  electrical  networks  concerns  the 
application  of  KirchhoflTs  voltage  law  around  infinite  loops.  Such  loops  must  now  be 
considered  because  of  the  possible  connections  at  infinity.  Our  theory  designates  those 
loops  (that  is  the  finite  loops  and  also  the  “perceptible  extended”  loops  (Zemanian  1080 
pages  33-34))  on  which  KirchhofFs  voltage  law  holds,  and  those  loops  (the  “imperceptible 
extended”  loops)  on  which  it  need  not  hold. 

m.  DOMAIN  CONTRACTIONS  AROUND  A  THREE-DIMENSIONAL 
ANOMALY.  We  now  describe  how  the  theory  of  infinite  electrical  networks  can  be 
used  to  generate  a  fast  computational  method  for  solving  a  particular  exterior  problem. 

HI.  1.  The  Model.  We  consider  again  the  flow  of  petroleum  into  an  oil  well,  but 
tL.is  time  we  let  the  completion  zone  be  a  single  perforated  one.  Such  a  completion  zone 
is  made  by  firing  a  projectile  through  the  side  of  the  oil-well  pipe  to  fracture  a  balloon¬ 
shaped  region  of  rock  adjacent  to  the  borehole.  This  is  illustrated  in  Figure  4.  Here  too, 
the  borehole  is  taken  to  be  infinitely  long  in  both  the  upward  and  downward  directions. 
The  flow  of  petroleum  into  the  welt  is  given,  and  the  pressure  variation  in  the  surround¬ 
ing  rock  is  desired. 

That  pressure  P  is  determined  by  Poisson’s  equation 

(1) 

where  H  represents  the  flow  sources  measured  positively  as  flow  injected  into  the 
medium  and  the  parameter  a  is  the  ratio  of  medium  permeability  divided  by  the 
petroleum’s  viscosity.  In  taking  (1)  as  the  governing  equation,  we  are  actually  assuming 
that  the  fluid  is  incompressible,  Newtonian,  and  saturates  the  porous  rock  and  that 
gravitational  and  inertial  effects  are  negligible.  As  for  <r,  it  is  taken  to  be  a  constant  out¬ 
side  the  borehole  and  completion  zone  but  varies  three-dimensionally  and  at  considerably 
higher  values  within  the  completion  zoue.  The  interior  of  the  borehole’s  pipe  is  not  part 
of  the  problem's  domain. 
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m.  2.  The  Spherical  Grid.  We  choose  a  spherical  coordinate  system  (r  ,  0,  <p) 
whose  polar  axis  (0  *■  0  and  ir( 2)  coincides  with  the  centerline  of  the  borehole  and  whose 
origin  is  directly  adjacent  to  the  center  of  the  perforation.  This  too  is  indicated  in  Fig¬ 
ure  4.  Upon  choosing  a  discretization  of  the  coordinates  and  computing  the  correspond¬ 
ing  finite  differences  for  the  partial  derivatives  of  (1),  we  obtain  an  infinite  spherical  grid 
of  positive  resistors  that  are  fed  by  current  sources  at  several  points  of  the  pipe’s  per¬ 
forated  orifice.  (See  Zemanian  and  Zemanian  (in  press)  for  the  specifics  of  these  manipu¬ 
lations.)  The  returns  for  those  current  sources  are  at  infinity;  all  the  extremities  of  the 
grid  are  taken  to  be  shorted  together  at  infinity.  The  grid’s  nodes  lie  upon  the  concen¬ 
tric  sphere  Slt  S2,  Sif  ...  shown  in  Figure  4  and  are  distributed  upon  each  sphere  as 
shown  in  Figure  5.  However,  a  few  nodes  lie  within  the  borehole.  Some  of  these, 
namely,  those  on  5 1  are  simply  removed  along  with  their  incident  branches.  The 
remaining  nodes  •  those  on  S2  -  are  maintained,  but  the  resistances  of  their  incident 
branches  are  increased.  For  our  choice  of  increments  this  yields  48  nodes  on  S  i  and  72 
nodes  on  all  the  other  S}  Outside  Ss  the  medium  is  uniform  except  for  the  borehole. 
We  simply  ignore  the  borehole  outside  5S  because  there  it  occupies  only  a  small  fraction 
of  the  medium.  For  our  numerical  example  we  have  on  each  sphere  6  equilatitudinal  cir¬ 
cles  and  12  nodes  on  each  such  circle;  thus,  L  ■«  12  in  Figure  5. 

HI.  3.  A  Matrix  -  valued  Infinite  Ladder  Network.  Our  objectives  are  to 
determine  a  solution  (that  is,  the  nodal  pressures  measured  with  respect  to  the  ground 
node)  for  this  infinite  spherical  grid  and  to  find  a  fast  computational  method  for  comput¬ 
ing  numerical  values  for  that  solution.  Our  primary  interest  is  in  the  nodal  pressures 
within  and  near  the  completion  zone,  but  this  does  not  mean  that  we  can  simply  trun¬ 
cate  the  medium  at  some  arbitrarily  chosen  remove  from  Ss.  Instead,  we  shall  seek  a 
driving-point  conductance  matrix  Ks  that  exactly  represents  the  effect  of  the  infinite 
spherical  grid  beyond  Ss  as  measured  at  the  nodes  of  Ss. 

To  this  end,  we  decompose  the  spherical  grid  into  an  infinite  ladder  network  of  n- 
ports  where  n  *72.  This  is  shown  in  Figure  0  where  for  our  model  5.  For  example, 
ftj+i/ 2  is  the  n  Xn  resistance  matrix  of  the  n-port  consisting  of  all  radial  branches  con¬ 
necting  the  nodes  on  Sy  to  the  nodes  on  Sy+1.  Thus,  Rj+\/ 2  is  a  diagonal  matrix.  If 
;> 5,  then  -  because  of  the  numbering  system  of  Figure  5  -  /?;+1/2  has  six  12X12 
main-diagonal  blocks  of  the  form  el ,  where  e  is  a  positive  constant  equal  to  the  resis¬ 
tance  of  a  radial  branch  and  /  is  the  12X12  identity  matrix.  This  is  an  especially  sim¬ 
ple  form  of  a  circulant  matrix  (Davis  1979). 

Furthermore,  Gj  the  conductance  matrix  of  the  n-port  obtained  by  pairing  each 
node  on  5y  with  a  hypothetical  ground  node  and  letting  the  branches  on  5y  comprise 
the  interior  of  the  n-port.  In  terms  of  the  theory  described  in  Section  II,  the  hypotheti¬ 
cal  ground  node  is  in  fact  a  node  that  is  shorted  to  all  the  grid’s  extremities  at  infinity. 
Because  the  n-port  contains  no  branches  to  that  ground,  Gj  is  a  singular  matrix.  How¬ 
ever,  again  because  of  the  numbering  system  of  Figure  5,  G)  cun  be  written  in  the  form 
of  6X6  blocks,  each  block  being  a  12X12  submatrix.  For  /> 5,  the  six  main-diagonal 
blocks  are  circulants  with  exactly  three  nonzero  entries  in  each  row,  and  the  10  adjacent 
blocks  are  also  of  the  form  el ,  where  now  c  is  a  negative  constant.  All  the  other  blocks 
are  null. 
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The  facts  that  for  />  5  both  f?;_ t/2  and  G;  have  this  block  form  and  that  those 
blocks  are  circulant  matrices  will  be  invoked  to  make  use  of  an  operational  calculus  that 
will  speed  our  computations  considerably. 

HI.  4.  An  exset  Solution.  The  basic  idea  in  our  search  for  an  exact  solution  for 
the  chosen  grid  is  the  replacement  of  the  infinite  spherical  grid  outside  5S  by  a  set  of 
resistors  such  that  there  is  a  resistor  connected  between  every  pair  of  nodes  on  5$  as 
well  as  a  resistor  connected  between  every  node  on  5S  and  the  ground  node  -  2028  resis¬ 
tors  altogether.  These  resistors  will  be  called  the  “terminating  resistors”  on  S5  and  will 
be  so  chosen  that  the  driving-point  conductance  matrix  Y&  for  the  72-port  consisting  of 
the  infinite  spherical  grid  as  seen  from  the  72  nodes  on  5S  with  the  ground  node  as  the 
common  node  for  all  72  ports  remains  unchanged  under  the  stated  replacement.  When 
this  is  so,  the  vector  / s+1/2  of  currents  entering  the  spherical  grid  outside  Ss  at  the  72 
nodes  of  S5  is  exactly  determined  by  /s+ ,/2  —  Y SV s,  where  Vs  is  the  vector  of  node 
voltages  on  Ss. 


In  view  of  the  ladder  network  of  Figure  A,  where  now  7—5,  we  can  write  Ys  as  an 
infinite  continued  fraction  of  matrices: 


y  s 


i  i  i  i 

^5+1/2  +  G  6  ^  R  6+1/2  *  G  7 


(2) 


This  expression  does  not  exist  as  the  common  limit  of  its  even  and  odd  truncations.  In 
fact,  the  even  truncations  do  not  exist  because  the  Gj  are  all  singular.  However,  the 
odd  truncations 


y  M  _  1  .  .  .  _L_  1 

R  5+1/2  *  ♦  1/2 


(3) 


do  exist  for  every  N  and  correspond  to  the  driving-point  conductance  matrix  of  the 
finite  spherical  grid  whose  nodes  on  the  sphere  S/v+l  are  all  shorted  to  the  ground  node. 
In  effect,  assuming  that  Ks  is  given,  we  are  seeking  I$+i/2  M  the  limit  of  the  ll+J/2  for 
the  finite  spherical  grids  shorted  to  ground  at  their  spheres  SN+l-  Actually,  the  theory 
cited  in  Section  II  dictates  a  unique  / s-^i/2  for  the  infinite  grid.  Moreover,  it  has  been 
proven  (Zemanian  and  Zemanian,  in  press,  Section  9)  that  the  I^+J/2  U+i/2  a*  N  — * 

oo,  hence  our  interest  in  determining  (2)  by  computing  (3)  for  sufficiently  large  N. 


One  may  ask,  “Since  the  grid  is  being  truncated  at  this  point  of  the  analysis,  why 
not  truncate  it  at  the  beginning  and  seek  appropriate  boundary  conditions  on  the  sphere 
for  some  TV?"  One  answer  is  that  it  is  considerably  easier  to  control  the  error  of 
truncation  by  truncating  the  exact  solution  (2)  rather  than  imposing  a  truncation  before 
an  analysis  is  made.  Another  is  that  it  is  with  this  application  of  infinite  network  theory 
that  one  is  naturally  led  to  a  derivation  of  the  exact  solution;  such  an  expression  does 
not  seem  to  have  arisen  in  the  numerical  approaches  to  the  exterior  problem  listed  in  the 
Introduction. 


Actually,  we  are  not  done  until  an  analysis  of  the  grid  within  and  on  5S  is  also 
made.  However,  with  the  terminating  resistors  at  the  nodes  of  S5  in  hand,  one  need 
merely  make  a  standard  nodal  analysis  to  obtain  the  nodal  pressures  on  and  within  S5 
exactly.  This  solution  can  then  be  extended  to  as  many  nodes  outside  5S  as  desired  by 
using  pressure  transfer  ratios  (Zemanian  and  Zemanian,  in  press,  Section  7). 
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Alternatively,  an  elementary  form  of  a  limb  analysis  (Zemanian  1978),  which  is  in  fact  a 
marching  technique,  can  be  used  to  obtain  the  nodal  presures  on  the  first  several  spheres 
beyond  5S  most  rapidly  indeed,  but  this  can  not  be  carried  too  far  because  that  march* 
ing  technique  is  computationally  unstable. 

HI.  S.  An  operation*!  Calculus.  The  computation  of  (3)  for  a  fixed  N  can  be 
facilitated  by  employing  the  discrete  Fourier  transform  to  render  the  circulant  blocks  in 
the  Rj +1/2  and  Gj  into  functions  on  a  set  of  discrete  points  in  the  unit  circle  —  in  this 
case,  twelve  discrete  points.  Thus,  manipulations  of  the  circulant  matrices  implicit  in  (3) 
are  replaced  by  corresponding  manipulations  of  ordinary  functions.  This  operational  cal* 
cuius  replaces  the  72X72  matrices  by  6X6  matrices  dependent  upon  the  twelve  points  in 
the  unit  circle.  The  corresponding  continued  fraction  is  then  computed  twelve  times  and 
the  inverse  discrete  Fourier  transform  is  applied  to  obtain  the  terminating  resistors. 
This  provides  a  fast  method  of  computing  because  of  the  fast*Fourier-transfonn  algo¬ 
rithm. 

IQ.  6.  Computational  Advantages.  The  method  described  herein  is  especially 
advantageous  when  one  wishes  to  examine  many  shapes  and  permeabilty  variations  for 
different  completion  zones,  all  contained  within  a  fixed  sphere,  say,  5S.  This  entails  the 
recomputation  of  nodal  pressures  many  times.  Since  and  thereby  the  terminating 
resistors  on  5 s  do  not  depend  upon  the  parameters  within  and  on  Ss,  we  can  compute 
those  terminating  resistors  once  and  for  all  of  the  different  models,  given  a  fixed  spheri¬ 
cal  discretization  for  the  medium  outside  Ss.  In  this  way,  the  domain  for  our  computa¬ 
tions  has  effectively  been  contracted  down  to  and  its  interior,  and  thus  a  much 
smaller  system  of  equations  need  be  solved  for  each  model.  It  should  be  noted  however 
that,  because  the  terminating  resistors  connect  all  pairs  of  nodes  on  5S,  the  matrix  for 
those  equations  is  sparse  except  for  a  full  72X72  block  corresponding  to  the  nodes  of  S5. 
This  contrasts  with  the  usual  nodal  equations  for  a  standard  finite-difference  analysis 
within  SN>  its  matrix  being  sparse  throughout.  Nonetheless,  whenever  N  is  much  larger 
than  J  (<■"  5),  our  method  will  yield  a  considerable  saving  in  computation  time. 

One  way  of  choosing  a  suitable  N  is  to  successively  increase  its  choice  until  the 
changes  in  the  resulting  terminating  resistors  become  acceptably  small.  In  testing  our 
method  for  a  particular  choice  of  model  (see  Zsmanian  and  Zemanian,  in  press,  Section 
8),  we  compared  results  for  three  different  choices  of  N:  N  **  30,  55,  and  105.  For 
example,  the  high  and  low  values  of  pressure  at  the  nodes  of  53  were  .03106  and  .01364 
for  yV—30,  .03115  and  .01373  for  N  ■■  55,  and  .03117  and  .01375  for  N  «■  105.  Using 
pressure  transfer  ratios  for  the  computation  of  nodal  pressures  outside  S5,  we  found  that 
the  CPU  times  on  a  UNI  VAC  1100  computer  for  the  determination  of  the  pressures  at 
all  the  nodes  within  S ^  were  46  seconds  for  N  —  30,  1  minute  and  19  seconds  for  N  — 
55,  and  2  minutes  and  27  seconds  for  N  “  105.  We  note  again  that  in  all  these  cases, 
the  equations  needing  solution  were  for  the  nodes  within  and  on  5S  and  thereby  had  a 
matrix  of  order  336  X  336.  In  contrast  to  this,  a  standard  finite  difference  analysis  for 
the  nodes  within  and  on  SN  has  a  matrix  of  order  2064  X  2064  for  N  —  30,  3864  X  3864 
for  N  —  55,  and  7464X7464  for  N  —  105. 
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ABSTRACT.  We  describe  recently  developed  by  A.  Harten  and  S.  Osher  uni¬ 
formly  high  order  accurate  essentially  non-oscillatory  schemes.  The  design  involves 
an  essentially  non-oscillatory  oiecewise  polynomial  reconstruction  of  the  solution 
from  its  cell  averages,  time  evolution  through  an  approximate  solution  of  the  re¬ 
sulting  initial  value  problem,  and  averaging  of  this  approximate  solution  over  each 
cell.  To  _jlve  this  reconstruction  problem  we  use  a  new  interpolation  technique 
that  when  applied  to  piecewise  smooth  data  gives  high-order  accuracy  whenever 
the  function  is  smooth  but  avoids  a  Gibbs  phenomenon  at  discontinuities. 

As  a  result  of  a  construction  and  numerical  experiment  with  the  second  order 
upwind  schemes  for  the  MIID  equations  M.  Brio  and  C.C.  Wu  found  that  MHD 
equations  are  not  convex.  Specifically,  as  the  transverse  magnetic  field  goes  through 
zero,  the  slow  (fast)  field  becomes  degenerate  if  the  sound  speed  is  greater  (less) 
than  the  Alfven  speed.  This  property  is  called  nonconvexity.  As  a  consequence, 
the  solution  to  the  Riemann  problem  may  contain  a  compound  wave,  namely  a 
shock  followed  by  a  rarefaction  wave  of  the  same  family  and  transverse  component 
of  the  magnetic  field  changes  its  sign  across  the  shock  wave.  As  an  example,  we 
show  a  numerical  solution  which  contains  a  compound  wave.  For  the  same  Riemann 
problem,  the  traditional  solution  includes  a  180°  Alfven  wave  instead. 
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1.  UNIFORMLY  HIGH  ORDER  ACCURATE  ESSENTIALLY  NON- 
OSCILLATORY  SCHEMES.  Consider  hyperbolic  initial  value  problem  (IVP). 


«/  +  /(«),  =  0 

u(x,0)  =  ti0(*), 


Here  u  and  /  are  m  vectors.  The  Jacobian  matrix,  is  assumed  to  have  only 
real  eigenvalues  and  a  complete  set  of  linearly  independent  eigenvectors.  Let 

Xj  =  jh,  tn  =  TXT. 

Integrating  the  partial  differential  equation  (1)  over  the  computational  cell 
(xy-i/a.zy+i/a)  x  {tn,tn+ 1),  we  get 

uj+1  =  a?  -  A [fj+i/M  -  fj- i/2(u)l,  (2) 

where 

1  /*tn+‘ 

fj  + l/M  =  ~  /(«(^+l/2,0)^  (3) 

T  Jtn 

and 

8  i  =  r/  u(x,tn)dx. 

We  denote  by  i>y  the  numerical  approximation  to  the  cell  averages  fly  of  the  exact 
solution  to  (2)  and  set  t>y  to  be  the  cell  averages  of  the  initial  data.  Given  vn  =  {fly  }, 
we  compute  vn+1  as  follows: 

First  we  construct  u(x,  fn)  out  of  its  approximate  cell-averages  {uy}  to  the 
appropriate  accuracy  and  denote  the  result  by  L(x;  vn).  Next  we  solve  the  IVP: 

Vi  +f(v)x  =  0,  v{x}0)  =  L(x,vn)  (4) 

and  denote  its  solution  by  v(x,t).  Finally  we  obtain  v"4-1  by  taking  cell  averages 
of  v(x,  t): 

l  /•*;+»/* 

Vj  =  h  I  v(x,T)dx.  (5) 

n  1/» 

We  define  its  total  variation  in  x  to  be: 

7V(«")  =  TV{v„{;tn))  =  £  |„-+1  - 

i 


AMS-MOS  Classification:  Primary  65M10,  35L65,  35L67,  76W05,  Secondary  65MD5 
Key  Words:  Conservation  Laws,  Finite  Difference  Scheme,  Essentially  Non- 
oscillatory,  MHD  equations,  Riemann  problem. 
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where  |  |  denotes  any  norm  on  Hm. 

Our  goal  is  that,  if  the  initial  data  uo(*)  are  piecewise  smooth,  then  for  h 
sufficiently  small 


TV(vh(-t t  +  U))<  TV(vn(  •  • , 0)  +  0(hN+l) 

where  N  is  the  order  of  accuracy  of  (2). 

The  averaging  procedure  does  not  increase  the  total  variation,  therefore,  the 
design  of  ENO  high  order  accurate  schemes  boils  down  to  a  problem  on  the  level  of 
approximation  of  functions:  that  of  constructing  an  essentially  non-oscillatory  high- 
order  accurate  interpolant  of  a  piecewise  smooth  function  from  its  cell  averages. 

First  we  consider  the  scalar  case  of  (1),  m  =  I.  In  [5]  we  have  constructed  an 
essentially  non-oscillatory  piecewise  polynomial  of  order  N,QN(x\w),  that  interpo¬ 
lates  a  piecewise-stnooth  function  te(j)  at  the  cell  interface  points: 

Q*(*y+i/a;«0  =  (6) 

and  satisfies,  wherever  u>(:r)  is  smooth 

(^)  V  (x  ±  0;  u)  =  (^)r»(x)  +  0(/.w+1— ).  r  =  1 . N.  (7) 


We  shall  use  this  polynomial  together  with  two  different  approaches  to  design 
ENO  schemes.  Those  methods  are; 

RP:  Reconstruction  via  the  primitive  function. 

RD:  Reconstruction  via  deconvolution. 

We  describe  here  the  first  one,  for  the  second  approach  the  reader  is  referred 
to  [5].  Let  W(x)  be  the  primitive  function  of  u(z) 

\V(x)  —  f  u(«)ds.  (8) 

J  a 


Since  we  wish  to  reconstruct  u(x)  out  of  its  approximate  cell  averages  vy,  (drop¬ 
ping  the  <  orn  dependence),  we  have  an  approximation  to  H'(zy+i/a) 

»/<xJ+,/a)  =  £ti.  0) 

k—0 
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In  each  cell  lj  :  {x/xy_ i/3  <  x  <  xy+i/a},  QN(xyw)  is  a  polynomial  of  degree 
N  which  interpolates  u>(xy+i/2);  i.e.  for  all  j 

QN(X]+i/2\w)  =  w(xj+  i/2).  (10) 

Thus  QN(x,  w )  is  a  continuous  piecewise  polynomial,  and  both  of  d/dx  QN(x± 0;  w) 
are  globally  well  defined. 

Our  approximation  to  (1)  can  be  obtained  by  solving  (4)  with 
v{x,w)  —  d/dx  Qn(x\tvn)  =  L(x;vn), 

obtaining  v(x,  t),  0  <  t  <  t,  and  then  computing  cell  averages  (5).  This  can  be 
rewritten,  using  the  divergence  theorem,  as: 

»r,  =  ""-A  O') 


since 


QN(xi+i/2^n)-QN{xj-i/2\wn) 

h 


because  of  (6)  and  (9). 

Here  f"+l/7  is  computed  by  averaging  the  flux  function  /(«)  applied  to 

»(*y+ 1/2.0  as  in  (3). 


For  general  /(«)  the  explicit  solution  to  (4)  can  be  difiicult  to  obtain  and 
various  approximations  might  be  applied  (5|. 

In  the  next  section  we  will  describe  some  results  of  the  application  of  the  upwind 
schemes  for  the  MUD  equations. 


2.  NONCONVEXITY  OF  THE  MHD  EQUATIONS.  The  equations  of  ideal 
magnetohydrodynamics  (MHD)  characterize  the  flow  of  conducting  fluid  in  the 
presence  of  magnetic  field.  They  represent  coupling  of  the  fluid  dynamical  equations 
with  Maxwell’s  equations  of  electrodynamics.  By  neglecting  displacement  current, 
electrostatic  forces,  effects  of  viscosity,  resistivity,  and  heat  conduction,  one  obtains 
the  following  ideal  MHD  equations  (lj: 


Pt  +  V  •  (pit)  =  0 

(pu)t  +  f  •  (pitu  +  YP *  -  BB)  =  0, 

Bt+V  (till  -  Bti)  =  0, 

h't  +  V  •((/•;+/»*)«  -  B(  B  -ti))=  0, 
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with  the  additional  requirement  that  V  ■  B  =  0,  which  is  satisfied  if  it  is  satisfied 
initially.  In  the  above  equations  the  following  notations  are  used:  p  for  density, 
u  for  velocity,  B  for  magnetic  field,  P  for  static  pressure,  P*  for  full  pressure, 
P*  =P+%  IB  /3,  E  for  energy,  E  =  \p  In  P  +  P/(i  -  1)  +  J  ID  l2,  and  7  for  the 
ratio  j|f?*|3,  5 1«3 |a  of  specific  heats.  We  consider  one-dimensional  MHD  equations 
which  are  obtained  from  the  above  system  by  assuming  that  all  variables  depend 
on  x  and  t  only.  The  resulting  equations  are: 

Pt  +  (pu)x  =  0, 

(pu) t  +  (pu3  +  P*)x  =  0, 

(pv) t  +  ( puv  -  BxBy)x  =  0, 

(pw) t  +  (puw  -  BxBa)x  =  0, 

(By)t  +  (ByU  —  Bxv)x  —  0, 

(Bg)x  +  (BgU  —  Bxw)x  =  0, 

Et  +  ((E+  P*)u  -  Bx(Bxu  +  ByV  +  Bgw))x  =  0. 

Bx  =  const,  u,  v,  and  tv  are  three  components  of  the  velocity  field. 

The  eigenvalues  of  the  Jacobian  matrix  can  be  written  in  nondecreasing  order 
as 

u-c/ ,  u-ca,  u-c„  u,  u  +  e«,  u  +  c«,  u  +  c/, 

where  c/,  ca,  c,  are  called  the  fast,  Alfven,  and  the  slow  characteristic  speeds, 
respectively.  They  can  be  expressed  as: 

C/..,  =  }((«,)3±vWr^5^).  U2) 

with  the  following  notations: 

b.  =  B,/(P)i, 
b,  = 

b.  =  B.Hp)t , 
b*  =  6j  +  b,,  +  b.*, 

(o‘)J  =  (">  -  !)(//  -  |(oa  +  u*  +  uia))  +  (2  -  7)6a, 
and  a  is  the  sound  speed  which  is  related  to  a*  by 

a2  =  (a*)a  -  6a. 

\ 

In  equation  (12)  the  plus  sign  is  for  c/  and  the  minus  sign  for  ct.  There  are 
two  points  where  these  eigenvalues  may  coincide:  \ 
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(1)  At  Bx  =  0,  c,  =  ca  =0,  thus  u  is  an  eigenvalue  of  multiplicity  5. 

(2)  At  By7  +  5,  2  =  0,  t/ 2  =  max(a2,fcx2),  and  c, 2  =  min(a2,fc2).  Therefore,  for 
the  case  a2  ^  bx  3,  either  cj  3  =  bx  2  or  c, 2  =  fcx  2,  thus  multiplicity  of  u  ±  ca  is 
2;  and  for  the  case  a2  =  bx  3,  c/  3  =  c, 2  =  6X  2  and  the  multiplicity  of  u  ±  ca 
is  3. 

The  usually  used  eigenvectors  [1]  do  not  form  a  complete  set,  i.e.,  near  the 
points  where  either  Bx  =  0  or  By  3  +  Bx  2  =  0,  this  set  is  not  well-defined  and  the 
matrix  with  these  eigenvectors  as  its  columns  becomes  singular.  However,  by  proper 
renormalization  a  complete  set  can  be  obtained  [2].  Specifically,  near  the  points  that 
Bt  =0  (Bt  2  =  By  2  +  B 1 2 ) ,  the  normalization  factor  is  proportional  to  Bt  for  the 
fast  wave  if  a2  <  bx  3  and  for  the  slow  wave  if  a3  >  bx  2.  Due  to  the  required 
normalization,  these  waves  are  not  genuinely  nonlinear  as  usually  believed.  For 
example,  using  the  set  of  right  eigenvectors  given  by  Jeffrey  and  Taniuti  it  follows 
that  both  fast  and  slow  waves  are  genuinely  nonlinear  (Theorem  E.l  in  Ref.  1),  i.e., 
(VA)  •  R  0  for  these  waves.  Now,  by  using  our  complete  set  of  eigenvectors,  one 
gets  (VA)  R  oc  Bt ,  when  Bt  is  small  for  the  fast  (slow)  wave  if  a2  <  6x2(a2  >  6X2). 
Thus,  when  Bt  is  zero,  either  the  slow  or  fast  wave  becomes  degenerate.  Threfore, 
they  are  nonconvex! 

As  a  consequence  of  the  nonconvexity,  there  exist  solutions  to  some  coplanar 
MHD  Rieinann  problems,  whose  initial  transverse  magnetic  fields  on  the  left  and 
right  states  have  opposite  signs,  such  that  the  transverse  magnetic  field  change  its 
sign  through  the  slow  (fast)  compound  wave,  which  consists  of  a  slow  (fast)  shock 
wave  and  attached  to  it  slow  (fast)  rarefaction  wave.  The  slow  (fast)  wave  can  exist 
if  condition  «2  >  bx  2  («2  <  bx  3)  holds.  Those  solutions  satisfy  physical  entropy 
condition  and  Liu’s  admissibility  criteria  (3)  and  are  suggested  by  the  mathematical 
theory  of  the  scalar  nonconvex  conservation  law  as  well  as  by  Liu’s  work  on  non¬ 
convex  Euler’s  equations  of  hydrodynamics.  Figure  1  illustrates  a  solution  to  the 
coplanar  MHD  Riemann  problem  containing  a  slow  compound  wave  (SM)  obtained 
by  an  upwind  numerical  scheme  [2]. 

The  initial  data  for  the  problem  is  as  follows:  pt  —  L,  «/  =  .0,  Vt  =  .0,  pt  — 
L,  pr  =  .125,  ur  =  0.,  ty  =  .0,  pr  =  .1,  Bx  =  .75.  The  initial  discontinuity  is 
in  the  middle  of  the  computational  interval.  The  solution  is  shown  after  800  steps 
with  A t  =  0.2.  It  consists  of  a  fast  rarefaction  wave  (FR)  and  slow  compound  wave 
(SM)  moving  to  the  left;  contact  (C),  slow  shock  (SS)  and  a  weak  fast  rarefaction 
wave  (FR)  moving  to  the  right.  The  numerical  solution  is  in  good  agreement  with 
the  appropriate  Rankine-Hugoniot  jump  relations  and  Riemann  invariants  across 
and  discontinuities  and  rarefaction  waves. 

To  illustrate  the  behavior  of  the  eigenvalue  along  the  shock  curve,  we  use  the 
numerical  data  for  the  left  state  with  respect  to  the  slow  shock,  (p  =  0.6763,  «  = 
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0.6366,  v  =  0.2333,  By  =  0.5840,  p  =  0.4574),  and  resolve  the  jump  relations 
using  (Dv)r  as  a  parameter  (4).  The  dependence  of  the  shock  speod(s)  and  the 
slow  characteristic  speed  («  —  c,)  with  respect  to  ( By)r  is  shown  in  Figure  2  for 
the  entropy  nondecreasing  shocks.  Point  (b)  denotes  the  intersection  of  these  two 
curves.  The  values  of  the  variables  at  the  intersection  point  are  as  follows:  p  = 
0.7935,  u  =  0.4983,  v  =  -1.290,  Dy  =  -.307,  p  =  0.667,  and  a  =  0.2995. 

The  portion  of  the  solution  containing  compound  waves  which  consists  of  a  slow 
shock  (SS)  and  attached  to  it  rarefaction  waves  (SR)  is  shown  in  Figure  3  for  the  u 
variable.  The  continuous  line  in  this  figure  shows  the  position  of  the  slow  shock  and 
attached  to  it  rarefaction  wave  using  the  above  calculations  for  the  shock  position 
and  appropriate  Ricmann  invariants  to  find  position  of  a  rarefaction  wave.  If  the 
rieht  state  is  obtained  using  Ricmann  invariants  and  intersection  point  as  a  left 

.'.  then  the  values  are  as  follows:  p  -  0.6965,  u  =  0.5987,  t>  =  —1.583,  By  = 
—0.5341,  p  =  0.5157.  The  dotted  line  shows  the  values  obtained  by  the  second 
order  upwind  scheme.  Those  values  are  p  =  0.6962,  u  =  0.5997,  v  =  -1.578,  p  = 
0.5133,  which  has  a  maximum  deviation  of  0.4^  in  density  and  pressure  from  th«. 
state  obtained  by  using  the  left  numerical  state  and  appropriate  Rankine-Hugoniot 
relations  and  Riemann  invariants.  The  agreement  is  within  the  numerical  accuracy 
used  for  those  calculations. 

Figure  4  illustrates  the  relation  between  the  shock  speed  (s)  and  the  charac¬ 
teristic  speeds  (A,)  for  different  points  on  a  shock  curve  in  Fig.  2,  using  x  -  t 
diagram.  The  left  state  corresponds  to  the  origin  in  Fig.  2,  and  the  right  state 
corresponds  to  the  points  denoted  by  a,  b.  c,  and  d  on  a  shock  curve  in  this  fig¬ 
ure.  The  first  case  (a)  illustrates  the  shock  with  convergent  characteristics  and  is 
similar  to  the  shocks  usually  encountered  for  Euler  equation  which  correspond  to 
genuinely  nonlinear  fields.  The  second  case  (b)  has  the  right  characteristic  speed 
equal  to  the  shock  speed.  This  allows  for  a  rarefaction  wave  to  be  attached  to  such 
a  shock  as  in  compound  wave  considered  in  the  above  numerical  example.  Case  (c) 
shows  a  shock  having  divergent  characteristics  on  the  right  hand  side.  In  this  case, 
two  waves  of  the  same  family  can  travel  in  the  same  direction  without  one  being 
overtaken  by  another.  The  last  diagram  (d)  shows  a  particular  case  of  the  previous 
one,  the  characteristic  speed  is  constant  across  the  shock.  It  corresponds  to  a  180° 
Alfven  wave,  namely,  density,  pressure,  z-component  of  the  velocity  are  constant 
across  the  shock  and  the  transverse  magnetic  field  reverses  its  sign. 

lYaditional  solution  to  this  problem  is  to  include  an  180°  Alfven  wave  as  sug¬ 
gested  by  3D  solution  of  the  MHD  Riemann  problem.  This  is  because,  as  we  have 
mentioned  before,  all  waves,  except  Alfven  waves,  for  the  case  Bx  0  are  coplanar, 
and  therefore  Alven  waves  have  to  be  introduced  if  the  MHD  Riemann  problem  is 
not  coplanar. 

Numerically,  Lax- Friedrichs  and  upwind  schemes  give  the  solution  containing 
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compound  wave;  while  Lax-Wendroff  scheme  seems  to  give  a  one-parameter  family 
of  solutions  depending  on  the  magnitude  of  the  artificial  or  physical  viscosity  which 
approaches  solution  containing  an  180°  Alfven  wave  as  resistivity  becomes  smaller. 

Therefore,  one  of  the  approaches  to  pick  “physical”  solution  is  to  study  those 
problems  for  the  full  resistive  3D  MHD  equations  as  resistivity  and  Ba  tend  to 
zero  in  arbitrary  order.  A  valuable  tool  in  this  investigation  can  be  an  arbitrary 
high  order  essentially  non-oscillatory  scheme  which  will  allow  to  study  the  effects 
of  resistivity. 
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Dependence  of  the  schock  (s)  and  the  slow  characteristic 
speeds  (xs)  on  the  ( By )r . 
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Abstract 

We  discuss  finite  difference  methods  for  partial  differential  equations  on  polar  and 
spherical  coordinate  systems.  The  distinctive  feature  of  these  coordinate  systems  is  the 
coordinate  system  singularity  at  the  origin.  We  show  how  to  accurately  and  conveniently 
determine  the  solution  at  the  origin  for  both  scalar  and  vector  fields.  We  also  discuss 
the  Fourier  method  to  approximate  derivatives  with  respect  to  the  angular  variable  in 
polar  coordinates.  Computational  examples  are  presented  illustrating  the  accuracy  and 
efficiency  of  the  method  for  hyperbolic  and  elliptic  equations,  and  also  for  the  computation 
of  vector  fields  at  the  origin. 

1.  Introduction 

In  this  paper  we  consider  the  use  of  polar  and  spherical  coordinates  with  finite  dif¬ 
ference  methods,  determining  how  to  achieve  accurate  results  with  convenience.  Although 
the  use  of  finite  difference  methods  with  polar  coordinates  is  not  at  all  new  there  are  sev¬ 
eral  features  of  their  use  that  are  not  well  known  among  numerical  analysts,  computational 
scientists,  and  engineers.  In  particular,  the  accurate  treatment  of  vector  fields  w'ith  polar 
coordinates  is  not  widely  known. 

It  is  the  aim  of  this  paper  to  bring  together  the  pertinent  information  and  present 
it  in  an  organized  way.  As  such,  this  paper  presents  few  new  ideas,  but  it  is  hoped  that  it 
will  be  a  useful  addition  to  the  literature  on  numerical  methods. 

Much  of  what  is  presented  here  also  applies  to  axially  symmetric  problems  in  polar 
or  spherical  coordinates;  the  common  feature  of  these  problems  is  the  singular  nature  of 
the  coordinate  system  at  the  origin. 


Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041.  The  first 
author  was  also  supported  in  part  by  NSF  grant  MCS-8306880  and  ONR  grant  N00014- 
84-K-0454. 
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2.  The  Center  Formulas 

Consider  the  plane  with  a  polar  coordinate  system.  Each  point  is  determined  by 
its  polar  coordinates  (r,<^>)  which,  for  points  other  than  the  origin,  is  unique  up  to  integer 
multiples  of  2n  in  <t>.  However  the  origin  has  the  coordinates  (0,d>)  for  all  angles  <p,  and 
it  is  this  lack  of  uniqueness  in  the  coordinates  of  the  origin  that  introduces  difficulties  for 
numerical  methods.  These  difficulties  are  displayed  in  the  Jacobian  of  the  coordinate  map 
which  takes  ordered  pairs  in  iO.  oc)  *  JR  to  points  in  the  plane.  The  Jacobian  vanishes 
on  {  0  }  >  IR.  However  it  is  important  to  realize  that,  this  singularity  is  present  in  the 
coordinate  map  and  polar  representations  of  functions  and  need  not  be  present  in  the 
functions  themselves.  In  this  paper  we  shall  only  consider  functions  which  are  smooth  in 
the  domain  being  considered. 

The  singular  behavior  of  the  polar  coordinate  system  at  the  origin  usually  precludes 
the  direct  use  of  finite  difference  approximations  to  differential  equations  at  that  point.  We 
consider  therefore  the  use  of  interpolation  formulas  to  accurately  determine  the  solution  at 
the  origin.  We  begin  by  considering  a  function  defined  in  the  plane,  without  considering  a 
coordinate  system. 


Consider  a  smooth  function  u  defined  in  a  neighborhood  of  a  point  P  in  the  plane. 
We  wish  to  express  v(P)  in  terms  of  averages,  u(P,  p),  on  circles  of  radius  p  centered  at  P. 
We  begin  by  expanding  u  in  a  Taylor  series  in  cartesian  coordinates  with  the  origin  at  P , 


rkyl  dk*lu 

where  /?.%•  is  the  remainder  term.  Then. 

1  f2r 

it(P,p)  -  —  /  u  (p  cos  <p,p  sin  <f>)  do 

27!  Jit 


(2.1) 


(2.2) 


and  using  (2.1) 

u(P.p)  =  ^,c,p2‘Vvu[P)  +  RN  (2.3) 

/-U 

where  c/  =  1  4l(l\)2.  Formula  (2.3),  which  is  independent  of  a  coordinate  system,  is  the 
basis  for  determining  a  function  value  at  the  origin  given  values  of  the  function  at  points 
nearby  and  given  the  differential  equation  satisfied  by  u. 

Consider  now  a  uniform  finite  difference  grid  with  grid  points  (rt,<j>})  for  integers  i 
and  j  with  i  >  0  and  0  <  j  <  J  1  where  r,  =  t  Ar,  =■  jk<t>  for  Ar  >  0  and  A<j>  =  2n/J. 
For  a  function  u  defined  in  a  neighborhood  of  the  origin  P ,  we  have 
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(2.4) 


1  J  1 

u(lAr)  =  u(P,/Ar)  =  —  +  0(Ad>m) 

J  ,=o 

where  u/j  =  u(r/,<£;)  and  m  is  a  positive  integer  whose  value  will  be  considered  in  section 
5.  Using  (2.3)  we  then  have  the  relations 

u(P)  =  t/(Ar)  ■+  0(Ar2)  +  0(A<t>m)  (2.5) 

and 

u(P)  =  ^(4u(Ar)  -  fi(2Ar))  +  0(Ar4)  +  0(A*m)  (2.6) 

u 

which  can  be  used  with  finite  difference  methods  to  determine  values  at  the  origin.  Higher 
order  formulas  can  be  obtained  by  similar  means. 


3.  The  Laplacian  with  Polar  Grids 


When  the  differential  equation  being  solved  involves  the  laplacian  operator  then 
formula  (2.3)  can  be  used  to  a  special  advantage.  We  consider  as  examples  the  Poisson 
equation 

V2u  =  /  (3.1) 


and  the  wave  equation 


on  a  disk  of  unit  radius. 


(3.2) 


For  the  Poisson  equation  (3.1)  consider  the  semi-discrete  finite  difference  approxi 

mation 


1  (  ttj+|(0)  ~  Uj(4>)  !/,(</>)  -  U,-1(0)\ 

r,Ar  \r,+  i  Ar  f,'2  Ar  / 


1  d2u,(e>) 

ry  dtf 


=  fi(<t>)  for  i  >  0,  (3.3) 


where  we  discretize  only  the  radial  direction.  Employing  (2.3)  we  have  at  the  origin 

Ar2 

U( i  =  ti(Ar) - V2u(0)  4  0(Ar 4) 

4 

or 

ua  =  u(Ar)  -  ^f-/(0)  -  0(Ar4).  (3.4) 

4 

This  formula  maintains  the  second-order  accuracy  of  the  scheme  and  is  easy  to  use.  How¬ 
ever,  even  when  the  equation  being  solved  involves  the  laplacian,  (2.6)  may  be  more  ac¬ 
curate  or  convenient  to  use  than  formulas  such  as  (3.4).  Formula  (3.4)  has  been  used  by 
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Swarztrauber  and  Sweet  16  for  solving  the  Poisson  equation  in  a  disk,  and  by  Swarztrauber 
7  for  the  Poisson  equation  on  a  sphere. 

For  the  wave  equation  (3.2)  we  also  consider  a  semi-discrete  approximation  in  which 
only  time  and  the  radial  direction  are  discretized  with  the  angular  variation  continuous. 
Let  u, "(<*>)  be  the  approximation  to  «(nA/.r,.p).  At  the  origin  we  have 

»:  <(Ar)  0(Ar4) 

'  «S(Ar)-iAr*^|;+0(Ar‘) 

=  uJ(Ar)  - 

+  0(Ar4)  +  OtAr'Al’). 

using  a  central  difference  approximation  in  time.  This  gives  the  formula  at  the  origin  as 

=  2u”  -  ur1  +  4  (u"(Ar)  ~  <)  (3-5) 

which  maintains  the  second-order  accuracy  of  the  scheme.  Example  1  in  section  7  shows 
that  this  formula  gives  accurate  results.  Similar  methods  can  also  be  used  with  parabolic 
equations. 

4.  Vector  Fields  with  Polar  Grids 

In  addition  to  the  coordinate  singularity  at  the  origin,  the  polar  coordinate  rep¬ 
resentation  of  vector  fields  introduces  an  additional  difficulty.  Let  F  be  a  vector  field 
defined  on  a  domain  on  which  there  is  a  polar  coordinate  system.  The  polar  coordinate 
representation  assigns  to  each  vector  F(P)  the  component  in  the  radial  direction  and  the 
component  in  the  direction  of  increasing  angle.  This  representation  is  unique  at  all  points 
other  than  the  origin. 

At  the  origin  the  vector  F(0)  has  a  different  representation  for  each  choice  of  the 
radial  direction.  This  is  best  illustrated  using  the  mapping  between  the  polar  and  cartesian 
representations.  Let  (U,  V')  be  the  usual  cartesian  representation  of  the  vector  field  F  which 
is  uniquely  determined,  then  the  polar  representation  (u,r)  is  given  by 

v  =  U  cos  <j>  +  V  sin  6  (4.1) 

v  —  -U  sin  0  -  V  cos  <j>. 

Since  at  the  origin  the  pair  (U.V)  is  single  valued.  (4.1)  shows  the  multivalued  nature  of 
the  polar  representation. 
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I  sing  a  polar  grid  the  vector  field  F  will  be  represented,  and  approximated,  by 
vectors  (uu,  v,,)  at  each  grid  point  (r,,< i>}).  At  the  origin,  there  is  a  representation  (uo^Vo;) 
for  each  coordinate  direction  (0,  <t>3 ■).  For  consistency  these  representations  must  be  related 
by  the  formulas  (4.1).  That  is,  there  are  values  ((/0.  V0)  such  that 

v0j  =  L'o  cos  <t>}  r  V(l  sin  <bj  (4.2) 

t’oj  =  —  l/o  sin  <t>3  4  Vo  cos <p3 . 


The  values  of  U0  and  V0  can  be  obtained  by  formulas  such  as  (2.6).  For  example,  on  a 
uniform  grid  define 


U(iAr) 
V  (lAr) 


X!  u,j  cos  <pj  v,j  sin  <p3 
J  i=« 

1  J~i 

-  X]  Uij  sin  <t>}  4  vti  cos  q}  . 
J  j= 0 


Then  Vo  and  V7o  can  be  approximated  by 

V„  =  ^(4P(Ar)  -  t’(2Ar)) 

V’„  =  j(4V(Ar)  -  V(2Ar)). 


(4.3) 


(4.4) 


These  values  can  then  be  used  in  (4.2)  to  give  the  values  of  (u0>,  Uoj)-  Example  3  in  section  7 
demonstrates  the  accuracy  of  this  method  as  applied  to  the  Stokes  equations.  This  method 
has  been  used  in  Strikwcrda  j4  and  Nagel  and  Strikwerda  [3’  with  excellent  results. 


For  finite  difference  grids  which  are  not  uniform  in  the  angular  variable  formulas 
(4.3)  should  be  replaced  by 


U  (i  Ar) 
V  (t  Ar) 


1 


(j- 1 


-  X)  tty  (sin  •+!  -  sin  +  v0(coso^,  -  cos^.j) 
°  '  }-0 


1 


<j-\ 


-  XI  tt|j (cos -  cos </);_j)  4  t>,y(sin  o]Jr\  -  sind>;_j) 

°  ',=0 


(4.5) 

(4.6) 


where 


j- 1 

<7  =  2  X]sin(<£>+1  -  <(>,). 


>=« 

The  formulas  (4.5)  and  (4.6)  are  exact  for  the  case  when  the  vector  field  has  constant 
cartesian  components  in  a  neighborhood  of  the  origin. 
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5.  The  Fourier  Method 


We  now  consider  the  Fourier  method  for  the  approximation  of  derivatives  with 
respect  to  <f>.  Consider  a  periodic  discrete  function  f,  defined  on  grid  points  4>}  =  jA<p 
with  A0  =  2 n  /J .  The  object  of  both  the  finite  difference  and  Fourier  methods  is  to  obtain 
approximations  to  df  /d<p  at  the  grid  points.  The  Fourier  method  begins  with  the  finite 
Fourier  series  representation  of  f3.  i.e.  for  the  case  when  J  is  an  even  integer 


j  2-1 


fj  =  60  -r  ^2  (ak  sin  k<t>j  +  bk  cos k<j>j)  -l-  bj  cos (i&j)- 


(5.1) 


Note  that  cos (^<l>j)  =  ( — 1  )J'.  Replacing  in  (5.1)  by  a  continuous  variable  <t>  we  can 
approximate  d f  /d<f)  at  <t>,  as 


df  J/2~' 

—  |  ~  V'  akk  cos  k<i>j  -  bkk  sin  k.<f>} 
od>  3 


(5.2) 


and  similarly 


J!  2  1 


>:  ( akk 2  sin  k<f>j  -  bkk2  cos  k<t>j)  -  (-1);(^)26j. 


The  coefficients  ak  and  bk  are  easily  obtained  by 

2  ^  ~  * 

bk  ~  7  ^2  fi cos  ^ 3 
J  >= o 

2 

ak  =  -^/j  sinfop, 

J  j= o 

for  0  <  k  <  J / 2,  and 

1  J] 

‘ o  =  jZ/i 

J  }T  (I 

h  =  jE(  i 

J  j=i\ 


(5.3) 


(5.4) 


The  Fourier  method  has  the  advantage  that  it  gives  far  higher  accuracy  for  a  given 
number  of  grid  points  than  do  finite  difference  methods  (Gottlieb  and  Orzag  [2]).  Alterna¬ 
tively  to  attain  a  given  accuracy  the  Fourier  method  requires  significantly  fewer  grid  points 
than  do  finite  difference  methods.  For  example  2  of  section  7,  finite  difference  methods 
would  require  at  least  three  times  as  many  grid  points  in  the  angular  direction  to  obtain 
comparable  accuracy. 
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The  efficiency  gained  by  the  Fourier  method  over  the  finite  difference  method  for 
the  angular  variation  is  due  to  the  natural  periodicity  in  the  variable  0  .  Spectral  methods 
can  be  used  with  the  radial  variation  but  not  necessarily  with  the  same  gain  in  efficiency, 
Gottlieb  and  Orzag  2  . 


Line  successive-over-relaxation  (LSOR)  can  easily  be  used  to  solve  elliptic  boundary 
value  problems  in  polar  coordinates  in  which  the  Fourier  method  is  used  to  approximate  the 
derivatives  with  respect  to  <?.  The  basic  formula  for  LSOR  as  applied  to  the  semi-discrete 
approximation  (3.3)  is 


(5.5) 


<-l(fl)  ~  <(<*>) 

A  r 


4  Ar  j 


In  (5.5)  the  order  of  progression  through  the  grid  for  the  LSOR  is  in  the  direction  of 
decreasing  radius.  When  the  Fourier  method  is  used  to  approximate  the  derivatives  with 
respect  to  <f>,  the  Fourier  coefficients  of  the  update,  uj,+  1  -  uj'  can  be  easily  obtained  from 
the  coefficients  of  the  right-hand  side  of  (5.5).  That  is,  the  right-hand  side  of  (5.5)  is 
evaluated  for  each  value  of  j,  then  the  Fourier  coefficients  are  calculated.  Dividing  the 
coefficients  of  the  kth  node  by  -(2/ A r2  +  k7/r t2)  gives  the  coefficients  of  the  update,  from 
which  the  update  is  determined  at  each  value  of  j.  This  method  is  used  in  examples  2  and 
3  of  section  7. 


6.  Quadrature  Formulas 

The  approximation  of  integrals  by  sums  arises  in  several  contexts  in  the  use  of  finite 
difference  methods  on  polar  grids.  As  we  have  seen  the  approximations  at  the  origin  (2.5) 
and  (2.6)  use  integrals  in  q  at  various  values  of  r.  Also,  the  accurate  determination  of 
integral  quantities  over  the  domain  requires  quadrature  formulas  in  r  and  0. 

We  begin  by  considering  integration  in  the  angular  variable  only.  We  consider  a 
27r-periodic  function  f{0).  We  first  consider  the  error  resulting  from  approximating  the 


integral 

fir 

by  the  sum 

/  md<t> 

Jo 

J - 1 

(6.1) 

Y  fid>,)A0 

J-t' 

(6.2) 

where  A0  =  2ir/J  and  <t>}  ~  jA0.  By  the  theory  of  Fourier  series  we  have 


m  =  £;  (6-3) 

n=  -  oo 
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where 


(6.4) 


Thus 

-  E  i. «» «”''•*«  £  ■». 

J=0  jr.On=-cx  k--ot 

J-  1 

since  ^  e,n<’J  vanishes  unless  n  is  a  mu’‘iple  of  J.  The  integral  (6.1)  is  precisely  do  thus 

>=(• 

the  error  in  the  approximation  (6.2)  is 

00 

2*  12  a>'J- 

k--oc,k^O 


By  the  definition  of  the  an  in  (6.4)  we  have 

<.„=  J-(i«)-“  P 

IT  JO 

if  /  is  m  times  differentiable.  Thus 

l««l  <  Cm|»rm. 

for  some  constant  Cm  depending  on  /.  Thus  the  error  in  the  approximation  (6.2)  is 
bounded  by 

2 Cm  f]  \kJ\-m  =  0(J~m)  =  0{ A(T).  (6.5) 

*=i 


We  now  consider  quadrature  for  the  unit  circle  using  uniform  spacing  in  r  and  4>. 
We  consider 


f  I  r  dr  d<t>  =  2n  f  f[r)rd<f> 

Jo  Jo  Jo 


(6.6) 


where  /(r)  may  be  approximated  to  0(A<f>m)  as  in  (6.2).  Using  the  trapezoid  formula  we 
have 


27T 


2n  12  ICfp'  +  +  0(Ar2) 

i=0  L 

2rr  12  /«r«^r  ~  *hri&r  +  0(ArJ). 

i  i 
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Hence 


/  /  f(r,<t>)rdrd<t>^  £  £ /ori  ArA0  +  X-  £/;jryArA0  +  0(Ar2 ,  A(f>m).  (6.7) 

'/0  Jo  ,=1  j  -  0  -  >=c 

7.  Computational  Results 

In  this  section  we  present  results  of  computations  using  the  formulas  discussed  in 
the  previous  sections  applied  to  three  test  problems.  The  first  test  problem  is  to  solve  the 
second-order  wave  equation.  The  two  formulas  (2.6)  and  (3.5)  for  determining  the  solution 
at  the  origin  are  compared.  The  second  test  problem  is  to  solve  an  elliptic  equation  using 
the  LSOR  method  given  in  section  4  to  solve  the  discrete  equations.  The  third  test  problem 
uses  the  Stokes  equations  to  illustrate  the  use  of  the  formulas  for  vector  fields  at  the  origin. 

The  first  test  problem  was  to  solve  the  second-order  wave  equation 


u„  -  V2u  (7.1) 

in  the  unit  disk  for  0  <  /  <  1 .  The  exact  solution  we  used  was 

u(t,x,y)  —  ccs (t  -  .6x  -  .8y).  (7.2) 

The  equation  for  the  time  advancement  is 

«5+1  =  2uJ  -  ,  (7.3) 

where  the  discrete  laplacian,  Vj,  is  given  by  the  left-hand  side  of  (3.3)  and  the  derivatives 
with  respect  to  4>  are  approximated  by  the  Fourier  method.  The  formula  for  the  first 
time-step  is  based  on  a  Taylor  series  in  time  and  is 

«'i,  =  (7.4) 

where  and  (ut)“;  were  obtained  from  the  exact  solution. 

Both  the  interpolation  formula  (2.6)  and  the  formula  (3.5)  were  used  to  determine 
the  solution  at  the  origin.  The  interpolation  formula  was  applied  using  u(Ar)  and  u(2Ar) 
at  the  given  time  level  to  compute  u  at  the  origin  for  that  same  time  level.  The  results 
of  four  test  case's  are  displayed  in  Table  1,  where  1  and  J  are  the  number  of  radial  and 
angular  grid  points,  respectively,  and  K  is  the  number  of  time  steps.  Both  the  L 2  norm  of 
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the  error  and  the  error  at  the  origin  are  shown  for  each  case.  The  two  formulas  are  seen  to 
be  comparable  in  accuracy,  but  the  interpolation  formula  is  slightly  more  accurate.  This 
was  also  observed  in  all  the  other  cases  in  which  these  two  formulas  were  compared.  Since 
formulas  (2.6)  and  (3.5)  yielded  comparable  results,  we  used  the  simpler  formula  (2.6)  for 
all  subsequent  runs. 


Table  I.  Two  Center  Methods 


GRID 

FORMULA  1 

) 

FORMULA  2 

1 

1 

1  J  K 

TT 

C<rr 

!'Urrr|| 

ferr 

1  21  16  160 

j  .5586(-4) 

i 

.1010(-3) 

.5685(-4) 

1 

.1069(-3) 

41  20  220 

| 

.1662(-4) 

.31 77  (-4) 

.1623(-4) 

j 

.3105(-4) 

A  list  of  cases  using  formula  (2.6)  is  given  in  Table  II.  The  data  show  that  errors 
are  relatively  insensitive  to  J,  the  number  of  angular  grid  points,  for  the  chosen  values  of 
J .  The  number  of  time  steps  must  be  chosen  so  that  the  scheme  is  stable.  No  attempt 
was  made  to  determine  the  stability  condition  for  this  example.  Because  the  accuracy  of 
the  scheme  depends  on  the  three  parameters  /,  J ,  and  K  it  is  difficult  to  discern  the  order 
of  accuracy  of  the  scheme. 

Table  II.  Wave  Equation  Results 


I  J  K 

1  U  err 

Cerr 

...  _ I 

11  12  40 

.2050(-3) 

.3511  (-3)  | 

21  12  80 

!  .53 1 4  (-4 ) 

.9447(-4)  ; 

21  16  100 

.5294(-4) 

| 

.9525  (-4)  | 

| 

21  20  120 

.5397(-4) 

.9727(-4) 

41  12  160 

T.1595(-4) 

,2998(-4) 

41  16  180 

.  1623(— 4 ) 

.31 17(-4) 

41  20  320 

| 

.1662(-4) 

.3l77(-4) 
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Tests  were  made  using  a  non-uniform  radial  grid  for  this  test  case.  We  found  that 
in  all  cases  it  degraded  the  accuracy.  Further  st  udy  is  needed  to  determine  if  non-uniform 
grids  can  be  used  to  give  good  accuracy  and  less  restrictive  stability  limitations  on  the 
time  step. 

The  second  test  problem  was  to  solve  the  elliptic  equation 

V2u  -  c(x,y)v  -  0.  (7.5) 

on  the  unit  disk  with  u  specified  on  the  boundary.  The  exact  solution  was  given  by 

u(x,y)  =  exp((r  -  0.1)(y  -  0.5))  (7.6) 

with 

c(x,y)  =  (x  -  0.1  )2  +  (y  -  0.5)2.  (7.7) 


The  equation  (7.1)  was  approximated  using  the  left-hand  side  of  (3.3)  for  the  lapla- 
cian  with  the  Fourier  method  being  used  to  approximate  the  derivatives  with  respect  to 
<t>.  The  solutions  were  obtained  with  the  LSOR  method  discussed  at  the  end  of  section  5. 
If  finite  difference  methods  are  used  in  the  angular  variable,  then  a  direct  solver  such  as 
that  of  Swarztrauber  and  Sweet  [6]  can  be  used. 


The  results  of  several  test  runs  are  displayed  in  Table  111.  The  number  of  grid 
points  is  given  along  with  the  iteration  parameter  w  and  the  tolerance  on  the  updates. 
The  iterative  procedure  was  stopped  when 

||un+1  -  un i|/w  <  tol.  (7.8) 


The  number  of  iterations  required  for  convergence  is  seen  to  be  dependent  on  the  number 
of  radial  grid  points  and  not  on  the  number  of  angular  grid  points.  The  accuracy  at  the 
origin  is  seen  to  be  relatively  independent  of  the  value  of  J,  as  is  expected  from  the  analysis 
of  section  5.  This  was  also  noted  for  test  problem  1.  The  norm  of  the  error  is,  however, 
dependent  on  J  for  small  values  of  J.  If  J  is  sufficiently  large  then  the  second-order 
accuracy  of  the  scheme  is  seen.  A  center  formula  similar  to  (3.4)  gave  results  comparable 
to  those  obtained  by  (2.6).  The  results  shown  were  obtained  by  using  center  formula  (2.6). 

The  third  test  problem  was  to  solve  the  Stokes  equations 


V2u  - 

u 

-24 

dv 

dp 

r* 

r3 

d<t > 

dr 

V2r  - 

r 

_  2* 

dv 

dp 

* 

r* 

r3 

d<t> 

rdo 

1  dru 

1 

dv 

r  ~d7 

— -  - 

r 

d<t> 

0 

0  (7.9) 

0 
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Table  III.  Poisson  equation  results 


~T 

!  iter 

; 

to 1 

i^errll 

Cerr 

1 

; 11 

12 

42 

1.5 

l(-5) 

•13(-2) 

-.11  (-2) 

11 

16 

42 

1.5 

1(-5| 

.  1 3(-2) 

-.11  (-2) 

1  21 

1 

8 

53 

1.8 

l(-6) 

•39(-2) 

-.63(-3) 

1 

21 

12 

52 

1.8 

i(-«) 

.38(-3) 

-,28(-3) 

l 

21 

i 

i 

16 

j  52 

1.8 

l(-6) 

.35(-3) 

-.29(-3)  j! 

|| 

41 

12 

131 

1.9 

l(-7) 

rTl9(-3) 

--74(-4) 

!  41 

16 

131 

1.9 

H-n 

.90(-4) 

-.75(-4) 

41 

20 

131 

i— 

1.9 

l(-7) 

.90(-4) 

-.75(-4) 

i 

j 

|61 

12 

28  r 

1.9 

l(-7) 

■17(-3) 

-26(-4)  j! 

1 

61 

l 

16 

283 

1.9 

l(-7) 

.32(-4) 

-,27(-4) 

!  61 

20 

281 

1.9 

l(-7) 

,30(-4) 

-.26(-4)  ! 

61 

i 

24 

286 

1.9 

l(-7) 

,32(-4) 

-,27(-4) 

-  -  — ^ 

on  the  unit  disk  with  the  velocity  components  u  and  v  given  on  the  boundary. 

The  exact  solution  was  given  by 

u(r,<}>)  -  r  sin  <£(r  cos  <£- a)(r  -  a  cos  -1-  ^(r  -  a  cos  ^)//?2 

v(r,4>)  —  a  r sin2  <f>(r cos<£  -  a)/R*  4  -a  sin<j>/A>2 

2 

p(r,4>)  =  2rsin  <f)(r  cos<f>  -  a)/R* 

with 

R2  =  r2  -f-  a2  -  2arcos<t>  (7.10) 

where  a  had  the  value  1.5.  Notice  that  for  this  solution  the  polar  representation  of  the 
solution  at  the  origin  is  multiply  valued  with 
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(«(o,*).r((u))  i co;0/:;0).  aio 

2a  2« 

which  corresponds  to  a  vector  of  magnitude  {2a)  1  in  the  direction  of  the  negative  x-axis 


The  system  (7.9)  was  approximated  using  the  discrete  laplacian  as  given  in  (3.3) 
with  the  Fourier  approximation  of  derivatives  with  respect  to  o.  The  system  was  solved 
with  the  iterative  method  as  given  in  Strikwerda  1984bj  with  the  LSOR  method  used  to 
update  the  velocities.  Explicitly  the  formulas  are: 


and 


(- 


Ar2 


1  5 2 
r,2502 


)(A<i) 


.1/41 


<*■>(  .  (ri+l/2‘ 

r,Ar  \ 

Ar2 

1  52u"  u" 

i  v  1,;  uij 

2  at-, 

r2  502 

r? 

r?  a«> 

IK**: 

P‘U 

Pi  4  2 ,] 

i  *  2 ‘IPiJ  lj  ^  3p»,  ]  '  Pi-lj  \  \ 

VAr  )  ’ 


(- 


Ar2 


I  *!_ 

r2  502 


)  (At-y 


(' 


i+1/2  ’ 


«..J  -  <,) 

Ar3 


~  ri- 1/2 


K  ~  Ci !>) 

Ar2 


I  1  1  dp.,j 

r,2  502  r2  r*  50  r  5<Z> 


) 


with 

and 

The  pressure  was  updated  by 


=  u".  +  Au*' 

t,j  u»,j  1 


v"41  =  r"  4  Ar" 
«j  «,j 


_,+i  =  „  / 1  (rt+iur++i1,>  - 

P,}  P,]  1  \rt  2Ar 


u-+l,;  ^i,;  '  ■2U.-l,;  +  1  gt,«j 

6Ar  r,  50 


(7.12) 


(7.13) 


(7.14) 


where  7  is  an  iteration  parameter,  as  described  in  Strikwerda  j5!  The  third-order  differences 
with  respect  to  7  in  (7.12)  and  (7.14)  are  necessary  to  preserve  the  regularity  of  the  scheme 
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and  hence  the  smoothness  of  the  solution,  e.g.  Bube  and  Strikwerda  il  and  Strikwerda 
i'4].  The  derivatives  with  respect  to  <j>  which  are  marked  with  a  carat  in  (7.13)  and  (7.14) 
are  computed  as  in  the  Fourier  method  with  the  addition  of  the  term 

±  WiX-')'-  (7-15) 

where  the  plus  sign  is  used  in  (7.14)  and  the  minus  sign  in  (7. 13). These  terms  are  included 
to  ensure  the  regularity  of  the  scheme  and  hence  the  smoothness  of  the  solution.  Without 
terms  such  as  these  the  solution  would  contain  Fourier  modes  with  wavelength  2A0  of 
sufficient  amplitude  to  affect  the  accuracy  of  the  solution. 

The  results  of  test  problem  3  are  displayed  in  Tables  IV  and  V.  In  Table  IV  the  /2 
norms  of  the  error  are  displayed  for  the  velocity  components  and  the  pressure.  That  is, 

/j  i  \j- 1 

«err|  =  I  ~  ]T  £  |u(r„<^)  -  Ut}  V.ArAtf 

V*  *'=1  i= 0 

where  the  initial  factor  of  tt1  is  included  to  normalize  by  the  area  of  the  disc.  The  error 
for  v  is  computed  similarly.  The  expression  u(r,,4>j)  is  the  exact  solution  evaluated  at 
(r,,<pj)  and  ut  J  is  the  computed  solution  at  that  grid  point. 

Table  IV.  Norm  Errors  for  Problem  3 


'  J 

i. 

i,lVrr, 

i  Perr  l! 

Sh 

1 1  16 

il  (-1) 

®*1  ("4) 

.34 

-1.5(-3) 

11  24 

1  3.9(-5) 

3.3(-5) 

3.2(-2) 

-5.9(-5) 

21  32 

2.1(-6) 

2.1  (-6) 

4.0(-3) 

-2.3(-6) 

41  40 

! 

1.1  (-7) 

1.2(-7) 

3.6(-4) 

-l.S(-7) 

Because  the  pressure  is  defined  only  to  within  an  additive  constant  we  use 


/  i  /  J  i  _  V1 

Perr!  =  P(*\,<2>j)  -  P.j  -  A  p  2  5,r,ArAd» 


(7.16) 


where 


1/2,  if  j  =  0  or  7: 
1.  otherwise, 
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as  required  by  the  trapezoid  rule  (6.7),  and  A p  is  the  average  value  of  p(r,,<f>j)  -  Pij 
computed  over  the  disc,  i.e. 

j  1  J- 1 

EE(P(r.'^  i)  ;Vj)f,r,ArA(!).  (7.17) 

T  .  i j-o 

Also  displayed  in  Table  IV  is  the  value  of  6*  which  is  the  average  of  the  discrete 
approximation  to  the  divergence  of  the  velocity  field.  The  finite  difference  and  Fourier 
scheme  does  not  enforce  the  condition 


V  -u  =  0 

on  the  discrete  solution,  rather  the  iterative  method  converges  to  a  solution  with 

V*  •  u,j  -  6h  (7.18) 

where  6h  is  the  average  of  the  left-hand  side  of  (7.18).  Thus  the  value  of  6*  is  an 

a  posteriori  indicator  of  the  accuracy  of  the  discrete  solution.  As  seen  in  Table  IV  the 
numerical  method  gives  very  good  solutions  to  the  Stokes  equation. 

Table  V.  Errors  at  the  center  for  Problem  3 


I  J 

u 

r'”V  "  |  p . 

1 

11  16 

2.8(-3) 

8.1(-4)  j  -1.8(-7) 

;  li  24 

8.6(-5) 

1.8(-3)  1  1.7(-7) 

21  32 

7.3(-5) 

4.0(-4)  |  9.9(-8) 

I 

41  40 

| 

U _ 

7.2(-6) 

_ 

l.l(-4)  i  1.2(-7) 

1 

Table  V  displays  the  errors  at  the  center  for  test  problem  3.  The  errors  displayed 
for  u  and  t;  are  the  errors  for  <t>  equal  to  0  at  the  origin.  The  error  in  the  pressure  is 

P(0)  “  Pi,i  ~  Ap 

where  A p  is  the  average  of  the  difference  between  p(r, ,<j>})  and  p,  y  taken  over  the  whole 
disc  given  by  (7.17).  Note  that  the  computation  of  Ap  does  not  use  the  values  at  the 
origin. 
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Table  VI.  Iteration  Parameters  for  Problem  3 


' 

J 

i 

j 

tol 

iterations  t 

H  11 

16 

•4 

1.6 

>0H) 

73 

1  11 

24 

!  .4 

1.6 

, 

10(-4) 

65 

21 

32 

.2 

1.7 

10H) 

188 

41 

40 

.1 

1.8 

10(-S) 

596  1 

Table  VI  gives  the  iteration  parameters  and  the  resulting  number  of  iterations  for 
each  of  the  cases  reported  in  Tables  IV  and  V.  The  iterative  method  was  considered  to 
have  converged  when  the  successive  changes  in  v  and  v  were  less  than  than  tol  times  u 
and  when  the  changes  in  p  deviated  from  its  average  value  by  less  than  tol  times  'y.  That 
is.  when  the  value  of 


Z(aPm  -  Ap'Ve.r.ArA^  I 

o  j=0  ) 

was  less  than  tol  times  7  the  solution  was  considered  converged. 


(7.19) 


The  values  of  the  expressions  (7.16)  and  (7.19)  were  each  computed  in  one  pass 
through  the  data  by  the  following  modification  of  West’s  algorithm  [West  8],  To  compute 
the  value  of 

«.j=o 


where 


initialize  with 


i,J- 1  t,J~i 

A  =  £  X„aJ  Y, 

i,J=0  i,)~  0 


A 

Q 

X 


0 

0 

0 


Then  for  i  =  0  to  /,  and  j  =  0  to  J  -  1, 
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(7.20) 


Q  *—  QAAai}(Xi}-  xMA  +  aa) 
A  * —  A  -t  a,;. 


Thus,  to  compute  the  expression  (7.19)  we  have 


and 


|  r,ArA<£,  i  i-  1 
\  Jr,ArA<£,  t  =  7,0. 


This  algorithm  makes  the  computation  of  expressions  (7.16)  and  (7.19)  only  slightly 
more  difficult  than  the  computation  of  the  usual  norm. 

Conclusions 

The  results  of  the  test  problems  show  that  the  methods  presented  in  this  paper  can 
be  used  to  compute  accurate  solutions  to  equations  on  domains  with  polar  grids.  The  basic 
formulas  can  be  used  with  most  numerical  procedures.  The  use  of  the  Fourier  method, 
while  not  essential  to  the  center  formulas,  is  very  convenient  and  efficient  to  use  with  polar 
grids. 
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Abstract.  We  discuss  adaptive  finite  element  methods  for  solving  initial¬ 
boundary  value  problems  for  vector  systems  of  parabolic  partial  differential  equa¬ 
tions  in  one  and  two  space  dimensions. 

One-dimensional  systems  are  discretized  using  piecewise  linear  finite  element 
approximations  in  space  and  a  backward  difference  code  for  stiff  ordinary  differential 
systems  in  time.  A  spatial  error  estimate  is  calculated  using  piecewise  quadratic 
approximations  that  employ  nodal  superconvergence  to  increase  computational  effi¬ 
ciency.  This  error  estimate  is  used  to  move  and  refine  the  finite  element  mesh  in 
order  to  equidistributc  a  measure  of  the  total  spatial  error  and  to  satisfy  a  prescribed 
error  tolerance.  Ordinary  differential  equations  for  the  spatial  error  estimate  and  the 
mesh  motion  are  integrated  in  time  using  the  same  backward  difference  software  that 
is  used  to  determine  the  finite  element  solution. 

Two-dimensional  systems  are  discretized  using  piecewise  bilinear  finite  element 
approximations  in  space  and  backward  difference  software  in  time.  A  spatial  error 
estimate  is  calculated  using  piecewise  cubic  approximations  that  take  advantage  of 
nodal  superconvergence.  This  error  estimate  is  used  to  locally  refine  a  stationary  fin¬ 
ite  element  mesh  in  order  to  satisfy  a  prescribed  spatial  error  tolerance. 

Some  examples  are  presented  in  order  to  illustrate  the  effectiveness  of  our  error 
estimation  technique  and  the  performance  of  our  adaptive  algorithm. 


1  This  research  was  partially  supported  by  the  the  l'.  S.  Air  Force  Office  of  Scientific  Research.  Air 
Force  Systems  Command,  I'SAF,  under  (Jrani  Number  AFOSR  85-0156  and  the  F  S.  Army 
Research  Office  under  Contract  Number  DAAI,  03-8(>-K-(lll2.  This  work  was  used  to  partially  fulfill 
the  Ph  I),  requirements  of  the  first  author  at  the  Rensselaer  Polytechnic  Institute. 
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1.  Introduction.  Adjerid  and  Flaherty  [1-3]  developed  adaptive  finite  element 
methods  for  solving  m-dimensional  vector  systems  of  partial  differential  equations 
having  the  form 


M(x./  )u,  +  f(x./  ,u.vu)  =  S  [D*  (x./  ,u)u  ]  .  x  6  H.  /  >  0,  (la) 


k  1 


subject  to  the  initial  and  boundary  conditions 

u(x,o)  =  u°(x).  x  e  n  u  an. 


(lb) 


d  m 


either  u,  (x.f)  =  c,  (x,/  )  or  £  £  D*j  uj,  (*»*  )l/k  =  (x,t). 

*  -  1;  1  ‘ 

for  x  6  8H.  I  >  0,  i  =1.2 . m. 


(ic) 


They  considered  problems  in  one  ( d  =  1)  and  two  (d  =  2)  spatial  dimensions  with 

x  =  [/, . rd]T  denoting  a  position  vector  in  Rrf .  t  denoting  time,  and  ft  being 

either  a  segment  of  the  real  line  or  a  rectangle.  The  subscripts  t  and  Tl  denote  tem¬ 
poral  and  spatial  partial  derivatives,  respectively,  and  u  =  ]*/,.  ....  ud}~  denotes  the 
unit  outer  normal  vector  to  the  boundary  dfi  of  ft.  Problems  were  assumed  to  be 
parabolic  and  have  an  isolated  solution:  thus.  M  and  D*  ,  k  -  1.  ...,  d ,  are  positive 
definite  m  x  in  matrices. 


Adjerid  and  Flaherty  discretized  (1)  in  space  using  Galerkin's  method  with  a 
piecewise  linear  polynomial  basis  in  one  dimension  and  piecewise  bilinear  polynomials 
in  two  dimensions.  An  a  posteriori  estimate  of  the  spatial  discretization  error  was 
calculated  using  Calerkin’s  method  with  piecewise  quadratic  functions  in  one  dimen¬ 
sion  and  piecewise  cubic  functions  in  two  dimensions.  In  each  case,  a  nodal  super¬ 
convergence  property  of  the  finite  element  method  was  used  to  neglect  errors  at 
nodes  and.  thus,  improve  computational  efficiency.  The  error  estimate  was  used  to 
control  global  [l]  and  local  [2.  3]  refinement  procedures  that  added  and/or  deleted 
finite  elements  to  the  mesh  in  order  to  satisfy  a  prescribed  global  measure  of  the  spa¬ 
tial  discretization  error.  For  one-dimensional  problems,  the  error  estimate  was 
further  used  to  move  the  finite  element  mesh  so  as  to  equidistribute  the  global  error 
measure.  Ordinary  differential  equations  for  the  finite  element  solution,  error  esti¬ 
mate.  and.  in  one  dimension,  mesh  motion  were  integrated  in  time  using  the  back¬ 
ward  difference  code  DASSL  [18]  for  stiff  differential  and  algebraic  systems. 

Initially,  a  global  refinement  procedure  was  used  in  combination  with  mesh 
motion  to  satisfy  prescribed  error  tolerances  in  the  H 1  norm  [l].  This  procedure  was 
replaced  by  a  more  efficient  local  mesh  refinement  strategy  and  some  problem  depen¬ 
dent  parameters  were  removed  from  the  mesh  moving  scheme  [2].  In  particular, 
numerical  experiments  indicated  that  the  performance  of  the  error  estimation  pro¬ 
cedure  could  deteriorate  when  the  system  of  equations  governing  mesh  motion  was 
too  stiff.  Adjerid  and  Flaherty  [2]  remedied  this  defect  by  limiting  the  stiffness  of 
the  mesh  moving  equations  and  using  refinement,  instead  of  mesh  motion,  to  equidis¬ 
tribute  the  error  estimate  in  these  situations.  They  subsequently  extended  their  fin¬ 
ite  element,  error  estimation  and  adaptive  local  refinement  procedures  to  two- 
dimensional  parabolic  problems  |3]  and  proved  that  the  error  estimate  of  [1.  2]  con¬ 
verged  to  the  true  discretization  error  in  H  1  as  the  mesh  is  refined  for  linear  one¬ 
dimensional  parabolic  systems  [4]. 
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In  Section  II  of  this  paper,  we  review  the  one-dimensional  adaptive  procedures 
of  Adjerid  and  Flaherty  [l.  2).  describe  some  improvements  to  their  mesh  refinement 
scheme,  and  present  some  examples  that  illustrate  the  relationship  and  interaction 
between  mesh  motion  and  refinement.  The  essential  details  of  Adjerid  and 
Flaherty's  [3]  two-dimensional  procedure  and  the  dynamic  data  structures  used  in  its 
implementation  are  summarized  in  Section  III.  The  results  of  a  nonlinear  two- 
dimensional  example  are  also  presented  in  Section  III.  Finally,  in  Section  IV.  we  dis¬ 
cuss  our  results  and  suggest  some  future  directions. 


II.  ONE-DIMENSIONAL  ADAPTIVE  PROCEDURES.  The  one¬ 
dimensional  version  of  problem  (1)  consists  of  solving 

M(z  ./  )n/  +  f(jr  ,u.u7  )  =  [D(j  ./  ,u)u7 \t .  Tt(a.b),  t  >  0.  (2a) 

u(j.O)  =  u°(j).  /  £  [o.i>]  (2b) 


rn 

either  u,  (x  ,t )  -  ct  (t  )  or  £  DtJ  ( r  J  )  =  c,  (/ ). 

3  » 

for  x  =  a  .  b  .  t  >  0.  i  =  1.2 .  ni.  (2c) 

The  unit  subscripts  on  /  and  superscripts  on  D  have  been  omitted  for  simplicity. 

The  procedure  for  discretizing  (2)  and  estimating  the  spatial  discretization  error 
of  its  solution  are  identical  to  our  earlier  work  [l.  2]  and  are  briefly  summarized  in 
Section  II. 1.  The  essential  details  of  our  current  adaptive  procedure  are  presented  in 
Section  II. 2  and  some  examples  illustrating  its  capabilities  and  the  interplay  between 
mesh  motion  and  refinement  are  presented  in  Section  11.3. 

II. 1.  Discrete  System.  We  construct  a  weak  form  of  (2)  by  assuming 
u  €  selecting  a  test  function  v  €  Hq  .  multiplying  (2a)  by  v,  integrating  it  on 
a  ^  t  ^  b  .  and  integrating  the  diffusive  term  by  parts  to  obtain 

(v.Mu, )  +  (v.f)  -f  A  (v,u)  =  v:rD(/,/  ,u)u,  |B\  for  all  v  6  Hq  .  t  >  0, 

(3a) 

where 

h  b 

(v,u)  =  J  \(z  J  )T  u(x  .t )  dx .  A  (v.u)  =  J  v/D(x  , l  ,u)uj  dx.  (3b.c) 

a  a 

Recall  that  the  Sobolev  space  H]  consists  of  functions  that  arc  square  integrable  and 
have  square  integrable  first  spatial  derivatives.  Functions  belonging  to  Hp  are 
further  restricted  to  satisfy  any  essential  (Dirichlet)  boundary  conditions  in  (2c). 
while  functions  in  Hq  must  satisfy  homogeneous  versions  of  any  essential  boundary 
condition<. 

Initially  u  must  satisfy 


(v.u)  =  (v.u°).  for  all  v  £  Hq  ,  /  =  0. 


(3d) 
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and  anv  natural  (Neumann)  boundary  conditions  in  (2c )  should  be  used  to  replace 
Du,  in  the  last  term  of  (3a). 

Finite  element  solutions  of  (3)  are  constructed  by  selecting  finite  dimensional 
approximations  II  6  s£  C  H j}  and  V  €  S§  C  Hq  of  u  and  v.  respectively,  and 
finding  IT  such  that 

(V.Mt:f )  +  (Vi)  +  A  (V.U)  =  VrD(*,/.U)IJ,  |,\ 

(V,U)  =  (V.u°).  for  all  V  €  s£  . 

Specifically,  we  introduce  a  partition 

*(<  -v)  :=  {  o  =  r0(1 )  <  *,(*  )<••<  rs  (/  )  =  b  }  (5) 

of  [or  .6  ]  into  N  moving  subintervals  (r,  _j(/ ).  r,  (/ )).  i  =  1.2,  ....  S  .  1  ^  0.  and 
select  $£  and  Sq  to  consist  of  piecewise  linear  polynomials  with  respect  to  this  par¬ 
tition.  The  system  of  ordinary  differential  equations  that  result  from  this  spatial 
discretization  can  be  integrated  in  time  using  one  of  the  many  excellent  software 
packages  for  solving  stiff  differential  systems.  We  found  that  the  backward  differ¬ 
ence  code  DASSL  (cf.  Petzold  [18])  for  differential  and  algebraic  systems  best  fit  our 
purposes. 

The  spatial  discretization  error  of  the  finite  element  solution 

e(r.f )  -  u(z,f)  -  U(x,f  )  (6) 

satisfies  (3)  with  u  replaced  by  U  4-  e,  i.e.. 


for  all  V  €  Sq  .  I  >  0. 

(4a) 

t  >  0.  (4b) 


(v.M(U,  +e,  ))  +  (v.f(-.f  ,U+e.U,  +e, ))  +  ,4(v.U+e)  = 

vrD(/  ./  .Il+e)(lJ,  +  e7)|0\  fo  rail  v  €  Hj  ,  t  >0.  (7a) 

.» 

(v.e)  =  (v.u°-  U).  for  all  vf  /  =  0.  (7b) 

We  approximate  e  by  a  function  E  €  S0  .  where  Sc  is  a  finite  dimensional  sub¬ 
space  of  H o  consisting  of  piecewise  quadratic  functions  that  vanish  on  ir{t  .N).  We 
further  approximate  v  by  V  E  Sq  and  determine  E  as  the  solution  of 

(V,M(U,+E,))  +  (V.f(-./  ,U+E,U7+E,))  *  .4  (V.U+E)  -  0, 

for  J  V  E  Sq  ,  1  >0,  (8a) 

(V.E)  =  (V.u°- U).  for  all  Ve  .S*  .  f  =  0.  (8b) 

In  constructing  the  error  estimate  E(.r ,/  ).  we  assumed  the  superconvergence  of 
the  piecewise  linear  finite  element  solution  IT(/./).  i.e..  we  assumed  that  IJ(j./) 
converge-  at  a  faster  rate  on  7t(/,A')  than  elsewhere  on  a  <  r  <  h.  This 
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superconvergence  property  was  established  by  Thoinee  llOl  and  the  convergence  of  E 
to  e  has  been  proven  for  linear  problems  by  Adjerid  and  Flaherty  [4]. 

The  error  estimate  E(r,f )  is  used  to  control  the  refinement/coarsening  strategy 
and  the  motion  of  n(t  ,Ar).  We  determine  mesh  motion  by  solving  the  ordinary  dif¬ 
ferential  system 


j,  (# )  -  x,  j(/  )  =  -  A(  W,  -  H' ).  i=1.2 AT,  (9a) 

where  A  is  a  non-negative  parameter,  H‘,  is  an  error  indicator  on  the  subinterval 

).  and  W*  is  the  average  of  W't  ,  i  =  1,  2 . A’.  We  shall  take  IV,  to  be  the 

square  of  the  local  error  estimate  in  H  ’,  i.e., 

*. 

H-,  (f  |  =  IIEII,2,  :=  /  |ErE  +  E/E,]*;  (9b| 

however,  other  local  measures  can  be  used  [2]. 

When  A  >  0  and  IV,  >  H’,  the  right-hand  side  of  (9a)  is  negative  and  the 

nodes  it  and  j,  ,  move  closer  to  each  other.  Similarly,  the  nodes  /,  (/  )  and  r.  _,(/  ) 

move  apart  when  A  >  0  and  W,  <  W’ .  Coyle  et  al.  [16]  studied  the  stability  of  (9a) 
with  respect  to  small  perturbations  from  an  equidistributing  mesh  (i.e..  one  where 

IV,  (/ )  =  IV(/).  i  =1,2 . .  Ar ,  t  %  0)  and  showed  that  such  perturbations 

could  only  grow  by  a  bounded  amount  when  A  >  0.  IV,  >  0.  i  =  1,2, . A' .  and 

the  velocity  of  the  equidistributing  mesh  remained  finite  for  1  ^  0.  They  further 
showed  that  the  mesh  obtained  by  solving  (9a)  stayed  closer  to  the  equidistributing 
system  when  A  was  large.  This,  however,  introduces  stiffness  into  the  system  which 
makes  its  solution  expensive  and,  as  noted,  causes  some  difficulties  with  our  error 
estimate.  Adjerid  and  Flaherty  [2]  studied  (9)  and  developed  an  adaptive  algorithm 
for  selecting  A  as  a  function  of  1  that  balanced  stiffness  and  equidistribution.  The 
procedure  for  selecting  A  will  not  be  discussed  further,  but  it  has  been  used  in  the 
examples  of  Section  II. 3. 

In  order  to  maintain  sparsity,  we  eliminate  IV  by  combining  (9a)  on  two  neigh¬ 
boring  intervals  and  solve  the  scalar  tridiagonal  system 

i,  2>,  +/,,,«  -A(H',4|  -  W’,).  »  =  1.  2.  ...,  AT-1.  /  >  0.  (10) 

The  ordinary  differential  equations  resulting  from  (8)  and  (10)  are  solved  using 
the  same  backward  difference  software  that  is  used  to  integrate  the  finite  element 
system  (4 1. 

II. 2.  Adaptive  Algorithms.  In  addition  to  controlling  mesh  motion,  the  error 
estimate  E  described  in  Section  II. 1  is  used  as  an  error  indicator  in  conjunction  with 
procedure>  that  locally  refine  or  coarsen  the  mesh.  A  top-level  description  of  an 
adaptive  local  refinement/coarsening  algorithm  is  presented  in  Figure  1  in  a  pseudo- 
PASCAL  language.  The  procedure  adapjtm  integrates  the  system  (4,  8.  10)  from 
time*  / initial  to  t final  and  attempts  to  keep  the  spatial  error  estimate 
||E||,  <  TOL  ,  where  TOL  is  a  prescribed  tolerance.  The  time  steps  that  are 
selected  by  the  temporal  integration  routine  (e.g..  DASSL)  are  denoted  as 
At  [m  L  m  =  1,  2,  ...,  and  the  corresponding  times  are  lout  \m  ),  rn  =  0,  1,  •  •  *.  with 
tout  [0  initially  set  to  tinitial .  The  integration  is  halted  every  nstep  steps  or  when 
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tout  |m]  =  t  final  and  the  arrays  At  and  tout  art'  recomputed  with  tout  [0]  reset  to 
the  last  computed  time,  i.e..  tout  [m»/e/>]  or  t f  inal . 


procedure  adapfem  (tinitial .  tf  inal .  nstep  .  TOL  ): 
begin 

Calculate  the  initial  conditions  and  an  initial  mesh; 

{  Integrate  the  system  from  tinitial  to  tf  inal .  } 

w  0; 

tout  [0]  ;=  tinitial ; 
while  tout  [rn]  <  t final  do 

begin 

m  :=  m  +  1; 
redone.  :=  false: 

Integrate  (4,  8.  10)  for  one  time  step  A/  [n?]: 
tout  (m  )  :=  tout  [m  - 1]  +  A/  [m  ]; 

{  Check  the  error  estimate.  } 

if  (m  =  nstep  )  or  (tout  [m  )  =  t f  inal )  then 
begin 

Compute  a  new  value  of  A,  if  necessary: 

{  Refine  the  mesh.  } 

while  ||E(-, tout  |m  ])||,  >  TOL  do 
begin 

Add  elements  to  the  mesh: 

Redo  the  integration  on  the  refined  mesh  from 
t  =  tout  [Oj  to  tout  [m  ]; 
redone  :=  true 
end: 

{  Coarsen  or  regenerate  the  mesh.  } 

if  (||E(-.<ouf  [m  j)f|,  <  TOL  /  3)  or  (redone) 
then  Delete  elements  from  the  mesh,  if  possible 
else  Generate  a  new  mesh,  if  necessary: 

lout  jo)  :=  tout  jm  }: 
rn  :  =  0 

end  {  if  m  =  nstep  ...  } 
end  {  while  tout  [rn  ]  <  tfmal  } 
end  {  adapfem  }: 


Figure  1.  Top-level  description  of  an  adaptive  finite  element  procedure  with 
mesh  motion  and/or  local  mesh  refinement /coarsening. 
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The  spatial  error  estimate  ||E|||  is  checked  whenever  the  temporal  integration  is 
halted.  If  ||E||j  >  TOL  .  the  last  m  integration  steps  are  rejected  and  the  mesh  is 
refined  by  adding 


A”  |t  ]  :=  max  {roiind^HEH,.,  /  E\  -  1.  0  } 
elements  uniformly  to  (/,  =  1.2 . A*.  Here. 


round  ^(x  ) 


1 

trunr  (/  )  +  1.  if  r  -  trunr  (x)  ^  3 
trune  (x  ).  otherwise  . 


where  0  <  3  <  1,  trune  ( i )  evaluates  the  integer  part  of  x .  and 


E  :=  0.9  TOL  /  A*. 


(lla) 


(lib) 


(11c) 


The  choice  of  3  =  0.2  in  (11)  seemed  to  produce  refined  meshes  that  reliably  reduced 
||E|||  to  approximately  TOL  the  next  time  that  the  error  estimate  was  checked. 
Further  justification  for  this  value  of  3  is  given  in  Adjerid  and  Flaherty  [4]. 

The  integration  is  redone  from  tout  [0]  to  tout  \m ).  where  m  is  either  nstep  or 
such  that  tout  (rw  ]  =  tf  inal .  on  the  refined  mesh  which  has 

A*r  -  X  +  E  A’H  (12) 

I  - 1 

elements.  This  process  is  repeated  until  ||E(  .tout  [m  ])||,  ^  TOL  . 

Elements  can  be  deleted  from  a  mesh  whenever  ||E(-,/o«d  [m])|j,  <  TOL  /  3  or 
whenever  refinement  was  necessary  to  integrate  from  tout  [0]  to  tout  jm).  The  need 
to  refine  often  indicates  that  the  spatial  error  pattern  has  changed  and  that  fine 
grids  may  no  longer  be  needed  in  some  portions  of  the  domain.  A  mesh  is  coarsened 
bv  uniting  successive  pairs  of  elements,  (x,  ,.x, )  and  (a*,  ,jfil).  when 
||E(\Jouf  [m  ])||j  ■  <  TOL  /  3A7,  j  =  i .  i  +1.  This  union  of  elements  is  only  per¬ 
formed  when  a  significant  percentage  of  elements  may  be  removed  from  a  mesh. 
This  strategy  avoids  the  overhead  associated  with  restarting  the  temporal  integration 
routine. 

If  TOL  /  3  ^  ||E(\fou/  |m  ])||,  ^  TOL  .  we  continue  the  temporal  integration 
with  the  existing  mesh  provided  that  its  speed  is  not  too  great  and  it  is  not  close  to 
equidistributing  the  local  error  indicators.  A  mesh  where  the  error  indicators  are  not 
equilibrated  indicates  that  rnesh  motion  and/or  refinement  are  being  performed  in  a 
suboptimal  manner  and  that  a  new  mesh  may  he  more  efficient.  We  use  the  follow¬ 
ing  indicator  to  measure  the  effectiveness  of  a  mesh  x(t  .A')  with  respect  to  equidis¬ 
tributing  the  local  error  indicators: 


«(*(<  ,.V)) 


El 

I  r  1 


E  I 


;  J 


(13) 


If  jt  is  a  mesh  that  equidistributes  .  j  =  1.2,  ....  AT,  then  H’  =  H'. 
j  =  1.  2.  ...,  N  and  //  (rr )  =  0.  Larger  values  of  //  indicate  increasing  departures 
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from  equidistribution.  For  example,  suppose  all  of  the  error  is  concentrated  in  the 

first  element,  i.e..  H',  =  A’IV  and  H’j  =  0.  j  =  2.  3 . A’.  Then  p{n)  %  A',  which 

we  interpret  as  meaning  that  at  least  one  element  (the  last  one)  will  have  to  cross  A’ 
elements  in  order  to  equidistribute  the  error  indicators.  If  all  of  the  error  were  con¬ 
centrated  in  the  AT  /  2  1h  element,  then  //  (rr )  ~  N  /  2.  indicating  that  one  element 
has  to  cross  A’ /  2  elements  of  ir.  Whenever  the  mesh  speed  is  too  fast  and 
fj(n(tout  [w],Ar ))  >  0.1  A’,  we  generate  a  new  mesh  that  approximately  equidistri- 
butes  the  error  indicators  by  iteratively  removing  elements  with  small  error  indica¬ 
tors  and  refining  those  having  large  error  indicators. 

Additional  details  of  our  procedures,  such  as  the  generation  of  new  initial  condi¬ 
tions  whenever  the  number  of  elements  in  a  mesh  changes,  are  as  described  in 
Adjerid  and  Flaherty  (2] . 

In  Section  II. 3.  we  present  some  calculations  performed  on  stationary  meshes. 
These  were  done  by  using  a  code  based  on  adapfem  with  the  mesh  moving  parameter 
A  =  0.  Additionally,  we  only  generated  new  meshes  when  fi(n(tout  [m  ].A* ))  >  O'.  4  A* 
in  order  to  avoid  excessive  restarting  of  the  temporal  integration  routine. 

II. 3.  Computational  Examples.  We  conclude  this  section  by  presenting 
some  examples  that  illustrate  our  adaptive  strategies  and  also  attempt  to  appraise 
the  relative  advantages  of  mesh  moving  and  local  refinement.  There  are  several 
potential  reasons  why  an  adaptive  procedure  that  combines  mesh  moving  with 
refinement  would  be  very  efficient.  Mesh  moving  techniques  are  inexpensive  relative 
to  refinement  (cf.  Arnev  and  Flaherty  [6j)  and  the  use  of  mesh  motion  should  reduce 
the  need  for  refinement.  Mesh  motion  can  also  reduce  the  necessity  of  restarting  the 
temporal  integrator,  which  is  an  important  consideration  in  a  method  of  lines 
approach  such  as  ours.  Some  refinement  is  essential,  however,  since  mesh  motion 
alone  cannot  generally  satisfy  prescribed  error  tolerances.  Furthermore,  rapid  mesh 
motion,  e.g.,  towards  an  evolving  region  of  high  error,  ran  severely  restrict  time  steps 
and  diminish  the  efficiency  of  an  adaptive  procedure  (cf.  Adjerid  and  Flaherty  [2j). 
Finally,  many  numerical  techniques  converge  at  higher  rates  on  uniform  meshes  than 
they  do  on  nonunifonn  moving  meshes. 

There  is.  thus,  a  need  to  quantify  the  optimal  use  of  mesh  moving  with  local 
refinement;  however,  this  is  a  very  difficult  problem  and  there  have  been  very  few 
attempt-  in  this  direction.  Arney  and  Flaherty  [6]  presented  some  computational 
results  comparing  mesh  moving  and  local  refinement  procedures  for  two-dimensional 
hyperbolic  systems.  Bieterman.  Flaherty,  and  Moore  (15]  attempted  to  compare 
adaptive  local  refinement  and  method  of  lines  procedures  for  one-dimensional  para¬ 
bolic  problems  and  noted  the  difficulties  in  finding  appropriate  performance  meas¬ 
ures.  Herein,  we  apply  a  code  based  on  our  adaptive  procedures  to  two  computa¬ 
tional  examples  and  compare  results  on  moving  and  stationary  meshes.  We  use  the 
total  number  of  space-time  cells  to  integrate  the  partial  differential  system  from  tini- 
tial  to  /  /  inal  as  a  measure  of  performance.  A  similar  measure  of  computational 
complexity  was  used  by  Arney  and  Flaherty  [6].  It  has  several  apparent  deficiencies, 
such  as  not  providing  an  indication  the  effort  devoted  to  the  various  segments  of  the 
adaptive  algorithm. 

Eiamph  1.  Consider  the  linear  heat  conduction  problem 


u,  +  u j  +  g  (i ./  )  =  Ujj  .  —  1  <  r  <  1.  /  >  0. 


(14a) 


w  (r  .0)  =  u°(r  ).  - 1  ^  t  ^  1. 
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(14b) 


u  (-U  )  =  c,(/  ).  u  (1.  /  )  =  r2(l  ).  I  }  0. 


(14c  .d) 


We  select  g  ,  t/°.  c  c  2.  so  that  the  exact  solution  of  (14)  is 

ti  (*./)=  1  -  j{tanh  |10( j  -  /  -(-0.8)]  +  tanh  |20(r  +21  -  1.6)]}.  (15) 

Equation  (15)  represents  two  wave  fronts  initially  centered  at  x  =  —0.8  and 
x  =  1.6  and  moving  with  speeds  1  and  -2.  respectively.  The  center  of  the  fastest 
front  enters  the  domain  (-1.1)  at  x  =  1  and  1  =  0.3. 

We  solved  (14)  for  0  ^  ^  1.2  using  tolerances  of  2  *.  k  =  2.  3.  4.  5.  in  H] 

with  adaptive  procedures  on  moving  and  stationary  meshes.  The  total  number  of 
space-time  cells  used  on  0  ^  /  ^1.2.  the  exact  error  || r  ||,  at  i  =  1.2.  and  the  effec- 
tivity  index 


e  :=  ||E II,/  Ik  I  ,  (16) 

at  /  =  1.2  are  presented  in  Table  1.  The  moving  and  stationary  mesh  trajectories 
that  were  used  to  solve  (14)  with  a  tolerance  of  1/8  are  shown  in  Figure  2. 

Solutions  on  moving  meshes  used  less  than  half  of  the  space-time  cells  of  those 
on  stationary  meshes.  A  larger  number  of  cells  arc  needed  with  a  stationary  mesh 
because  the  temporal  integration  must  be  restarted  more  often  and  more  time  steps 
must  be  redone  due  to  a  failure  to  satisfy  the  error  tolerance.  In  each  case,  the 
actual  error  was  less  than  the  prescribed  tolerance  and  fine  meshes  were  concen¬ 
trated  in  high-error  regions.  The  effect ivity  index  is  a  common  method  of  appraising 
the  performance  of  an  error  estimation  technique  (cf..  e.g..  Babuska  et  al.  [9]). 
Ideally.  6  should  not  deviate  appreciably  from  unity  and  should  approach  unity  as 
N  increases.  The  results  of  Table  1  suggest  that  this  is  the  case.  The  performance 
of  our  error  estimate  seems  to  be  slightly  better  on  a  stationary  mesh  than  on  a  uni¬ 
form  mesh. 


■IHHkl  ».l  1 aUA  OKI  fl 

Moving  Mesh 

mi 

—■ 

SQrcm, 

mSmm 

BOH 

e 

mum 

HKfiUH 

'KJi 

Mm 

0.0848 

0.0903 

0.993 

1/16 

iWIHj 

9  !  1  ■ 

0.056G 

0.988 

1/32 

WrlrM 

0.0282 

0.996 

Table  1.  Number  of  space-time  cells  on  0  ^  1  ^  1.2.  spatial  discretization 
error  at  t  =1.2.  and  effect  ivity  index  at  /  =  1.2  as  functions  of  error  toler¬ 
ance  using  stationary  and  moving  mesh  methods  to  solve  Example  1. 


Example  2.  Consider  the  reaction-diffusion  system 
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Figure  2.  Mesh  trajectories  used  to  solve  Example  1  with  a  tolerance  of 
1/8  on  stationary  (upper)  and  moving  (lower)  meshes. 
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(I7a.b) 

(17c) 


u,  -  utl  -  Du  t:  *  T .  LT,  =  Ttl  4  n  Due  * 1  T . 

0  <  /  <  1.  1  >  0, 

D  =  Re*  /  ab . 

u  (x  ,0)  -  T  (x  ,0)  =  1.  0  <  x  £  1.  (17d,e) 

u,  (0,0  =  Tt  (0.0  =  0,  u(U)  =  T(l.l  )  =  1.  /  >  0.  (17f,g,h.i) 

This  model  was  studied  by  Kapila  1 17]  and  used  to  describe  a  single  one-step 
reaction  (A  —  B)  of  a  mixture  in  the  region  0  <  r  <  1.  The  quantity  u  is  the 
mass  fraction  of  the  reactant.  T  is  the  reactant  temperature.  L  is  the  Lewis  number, 
a  is  the  heat  release.  b  is  the  activation  energy.  D  is  the  Damkohler  number,  and 
R  >  0.88  is  the  reaction  rate. 

When  L  is  near  unity,  the  temperature  slowly  increases  with  a  "hot  spot"  form¬ 
ing  at  x  =  0.  At  some  time  1  >  0.  ignition  occurs  and  the  temperature  at  x  =  0 
jumps  rapidly  from  near  unity  to  near  1  4  a.  A  steep  flame  front  then  forms  and 
propagates  towards  i  =  1  with  speed  proportional  to  e°*  In  practical  prob¬ 

lems.  o  is  about  unity  and  b  is  large;  thus,  the  flame  front  moves  exponentially  fast 
after  ignition.  The  solution  tends  to  a  steady  state  once  the  flame  has  reached 
x  =  1. 

We  solved  (17)  for  0  ^  t  ^  0.5  with  a  =  1.  b  =  20.  and  R  -  5  using  toler¬ 
ances  of  0.2.  0.1.  and  0.05  on  stationary  and  moving  meshes.  The  number  of  space- 
time  cells  needed  to  solve  these  problems  are  presented  in  Table  2  and  the  mesh  tra¬ 
jectories  for  both  the  stationary  and  moving  mesh  calculations  with  a  tolerance  of 
0.1  are  shown  :u  Figure  3.  As  in  Example  1,  the  number  stationary  space-time  cells 
is  approximately  double  the  number  of  moving  space-time  cells. 


Tolerance 

f&XBS  GWhfli!  f&tl 

Bi— frtdntut 1 

0.1 

30300 

0.05 

r  y. 

118100 

Table  2.  Numl>cr  of  space-time  cells  as  a  function  of  error  tolerance  to 
solve  Example  2  on  0  ^  t  ^  0.5  using  stationary  and  moving  meshes. 


III.  TWO-DIMENSIONAL  ADAPTIVE  PROCEDURES.  Our  finite 
element,  error  estimation,  and  local  refinement  procedures  for  two-dimensional  par¬ 
tial  differential  systems  closely  parallel  our  one-dimensional  methods  and  are  briefly 
summarized  in  Section  III.  1  and  III. 2.  The  representation  of  data  and  its  manage¬ 
ment  are  much  more  complicated  in  two  dimensions  and  we  use  a  dynamic  tree  data 
structure  to  store  and  retrieve  information  about  the  inesh,  solution,  and  error  esti¬ 
mate.  Similar  structures  have  been  used  by  other  investigators  (cf.,  e.g.,  Babuska  et 
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Figure  3.  Mesh  trajectories  used  to  solve  Example  2  with  a  tolerance  of  0.1 
on  stationary  (upper)  and  moving  (lower)  meshes. 


1088 


i 


al.  |8-10j  and  Bank  et  al.  |12-14])  to  design  adaptive1  procedures  for  elliptic  systems 
and  they  have  been  shown  to  be  an  effective  means  of  reducing  storage  and  access 
overhead.  The  essential  details  of  our  tree  structure  are  described  in  Section  III. 2 
and  a  two-dimensional  combustion  problem,  similar  to  Example  2,  is  presented  in 
Section  III.'’ 

Ill .1.  Discrete  System.  A  weak  form  of  (1)  is  constructed  in  the  manner 
described  in  Section  II. 1  for  one-dimensional  problems.  Thus,  we  seek  to  determine 
v  €  Hg  such  that 


(v.u,)  +  (v.f(-.t  ,u.vt»))  +  A  (v.u)  =  f  [v^D’u,  +  vrD2u7ji/2]<f a. 

an 

for  all  v  6  H 0'  ,  t  >  0.  (18a) 


where 


(v.u)  =  (v,u°).  for  all  v  €  H ^  .  1  =  0, 


(v.u)  =  £v(i  ,y  .t  )T  u(r  .y  .t  )di  jdjr2. 


(18b) 

(18c) 


A  (v.u)  =  jf(v^D'(x,/  .u)uj|  +  v^D2(x,/  ,u)uzJ«.  *  ,dj-2.  (18d) 

We  have  set  the  mass  matrix  M  in  (1)  to  the  identity  matrix  for  simplicity. 

The  functions  u  and  v  are  approximated  by  IJ  6  Sg  C  ///,'  and  V 
€  Sq  C  Hj  .  respectively,  where  S£  and  Sg  are  spaces  of  bilinear  polynomials 
with  respect  to  a  piecewise  rectangular  partition  of  the  rectangular  domain  U.  The 
finite  element  solution  11  is  obtained  by  solving 

(V.U,)  +  (V.f(-./.U,yU))  +  A  (V.U)  =  J [VrD 'U,,*/,  +  XTD*V,t»2]do. 

for  all  X  £  S g  .  /  >  0.  (19a) 


(V.U)  =  (V.u°).  for  all  V  e  Sg  .  1  =  0. 


(19b) 


As  in  the  one-diine^sional  case,  the  spatial  error  e(x.f  )  :=  u(x.f  )  -  U(x  ■;) 
approximated  by  E  €  C  H )}  .  In  two  dimensions,  we  select  the  finite  dimensional 
space  S£  to  consist  of  piecewise  cubic  functions  with  respect  to  a  piecew’ise  rectangu¬ 
lar  partition  of  fl.  The  cubic  functions  are  biquadratic  polynomials  that  are  missing 
their  quartic  terms  (i.e.  serendipity  functions  in  the  terminology  of  Zienkiewicz  [20j ) 
and  further  vanish  at  the  vertices  of  each  element.  Thus,  once  again  w'e  take  advan¬ 
tage  of  nodal  superconvergence  to  simplify  our  approximation  of  the  discretization 
error.  However,  there  is  very  little  theoretical  justification  of  the  superconvergence 
property  for  two-dimensional  problems  and  we  are  relying  on  computational  evidence 
[3]  and  our  one-dimensional  theory  [4] . 


The  approximate  error  E  is  determined  )iy  replacing  u  and  v  in  (18)  by  U 
and  V  €.^o  C  Ho  •  respectively,  where  S'g  is  composed  of  the  same  cubic 


-V0 

tions  as  .  and  solving 


+  E 

func- 
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(V.  I),  +E, )  +  (V,f(-,/  .ll+E.y(IJ+E))  +  A  (V.  U+E)  = 


rD,(U+E)7ii/,  +  VrD2(U+E)J  u2\da.  for  all  V  £  sf .  t  >  (U20a) 


(V.E)  =  (V,u°-U°),  fora//  Vf  S0V  ,  1  >  0.  (20b) 

The  resulting  ordinary  differential  equations  (19)  and  (20)  for  the  solution  and 
error  estimate  are  integrated  in  time  using  a  code  for  stiff  systems  (e.g.,  DASSL). 

III. 2.  Local  Refinement  Algorithms.  A  top-level  description  of  our  two- 
dimensional  adaptive  procedure  closely  resembles  the  one-dimensional  algorithm 
shown  in  Figure  1.  except  that  we  have  no  mesh  moving  procedures,  as  yet.  Ini¬ 
tially.  the  domain  ft  is  partitioned  into  a  "base"  mesh  of  A’  x  M  rectangular  ele¬ 
ments.  which  is  the  coarsest  mesh  that  can  be  \ised  to  solve  the  problem.  Refine¬ 
ment  is  performed  by  bisecting  the  edges  of  a  coarser  element,  thus,  creating  four 
elements  where  there  was  previously  one.  A  base  mesh  having  four  elements  and  a 
refined  mesh  obtained  by  bisecting  one  of  them  is  shown  in  Figure  4. 

The  refinement  process  may  be  repeated,  i.e..  elements  may  be  bisected  again  to 
create  four  new  elements.  Additionally,  quartets  of  elements  that  were  created  by 
refinement  may  be  subsequently  deleted  if  they  are  no  longer  needed  to  maintai^ 
accuracy.  Bilinear  approximations  in  s£  and  S*  and  cubic  approximations  in  §£ 
and  Sq  are  constrained  to  be  linear  and  quadratic,  respectively,  on  edges  between 
elements  of  different  levels  in  order  to  maintain  continuity  of  U  and  E  on  ft. 

The  mesh  is  organized  as  a  tree  structure  with  the  domain  0  being  the  root  of 
the  tree  and  the  N  x  A/  elements  of  the  base  mesh  being  offsprings  of  the  root.  All 
nonleaf  nodes  of  the  tree,  other  than  the  root  node,  have  four  offsprings  which 
correspond  to  the  four  elements  created  by  refining  its  parent  element.  The  domain 
ft  is  referred  to  as  level  zero  of  the  tree,  the  elements  of  the  base  mesh  are  level  one. 
and  the  levels  increase  as  elements  are  recursively  refined.  The  tree  structure  for  the 
mesh  shown  in  the  lower  portion  of  Figure  4  is  displayed  in  Figure  5. 

Each  node  of  the  tree  contains  the  following  information: 

i.  the  element  numlx'r.  say  k  .  of  the  finite  element, 

ii.  the  level  /  of  the  tree. 

iii.  pointers  to  the  four  vertex  nodes  of  element  k  . 

iv.  pointers  to  the  four  midside  nodes  of  element  k ,  which  are  needed  to  represent 

E. 

v.  pointers  to  the  four  elements  neighboring  element  k.  with  a  null  pointer  used 
when  an  edge  of  element  k  is  on  the  boundary. 

vi.  a  pointer  to  the  parent  of  element  k ,  and 

vii.  pointers  to  the  four  sons  of  element  k .  with  null  pointers  used  when  element  k 
is  a  leaf  node  of  the  tree. 

As  in  the  one-dimensional  algorithm  of  Figure  1,  elements  are  added  to  a  mesh 
when  ||E||,  >  TOL  and  deleted  from  a  mesh  when  either  ||E||]  <  TOL  /  3  or  when 
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Figure  4.  A  roarse  mesh  with  four  elements  numbered  1  to  4  (top)  and  the 
resulting  mesh  after  refining  element  1  (bottom). 
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Figure  5.  Tree  representation  of  the  mesh  shown  in  the  lower  portion  of 
Figure  4. 


refinement  was  necessary  to  integrate  to  the  current  time.  Our  refinement  and  dele¬ 
tion  procedures  impose  the  following  two  rules,  which  Bank  et  al.  [12-14]  found  to 
aid  the  efficiency  and  accuracy  of  their  refinement  process  for  elliptic  systems: 

i.  the  1  -irregular  rule,  which  states  that  neighboring  elements  can  differ  by  at 
most  one  level  of  the  tree,  and 

ii.  the  3- neighbor  rule,  which  states  that  any  clement  where  the  number  of  edges 
containing  elements  at  a  higher  level  of  the  tree  and  the  number  of  boundary 
edges  totals  to  three  or  more  must  be  refined. 

Refinement  is  performed  by  examining  the  elements  of  a  mesh  by  levels, 
proceeding  from  the  root  to  the  leaf  nodes  of  the  tree.  An  element  k  is  refined  by 
dividing  it  into  four  subelements  whenever  ||E||,  k  >  TOL  /  \/A’, ,  where  Ne  is  the 
number  of  elements  in  the  mesh.  Elements  are  deleted  from  meshes,  other  than  the 
base  N  x  M  mesh,  by  pruning  the  tree.  A  quartet  of  elements  having  the  same 
pvent  is  deleted  if: 

i.  every  element  in  the  quartet  has  no  offsprings, 

ii.  every  neighbor  of  the  elements  in  the  quartet  are  at  the  same  or  lower  level  of 
the  tree,  and 

iii.  the  average  error  estimate  of  the  four  elements  is  less  than  TOL  /  3y/jV^ . 
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Additional  details  pertaining  to  other  aspects  of  our  adaptive  procedures  are 
presented  in  Adjerid  and  Flaherty  |3). 

III.  3.  Computational  Example.  A  code  based  on  our  two-dimensional  local 
mesh  refinement  procedure  has  been  written  and  applied  to  several  problems  [3]. 
Herein,  we  present  the  results  of  a  two-dimensional  version  of  the  model  combustion 
problem  considered  in  Example  2. 

Example  8.  Consider  the  partial  differential  system  on  the  rectangular  domain 
ft  :=  {  (ar,,*2)  |  0  <  xvx2  <  1  } 

T,  -  7,,,,+  D(1  +  a  -  T)e-f!T,  (»,»)€  tl,  I  >  0.  (21a) 

T  (x,0)  =1,  X  6  n  U  9ft,  (21b) 

Tti(0,x2.t )  «  0.  r(l,x2,/)  =  l,  0<x2<l'  t  >  0,  (21c.d) 

r7|(/i,o./)  =  o.  r(/,. u)  *ii  t  >  o.  (2ie,f) 

All  of  the  parameters  are  as  described  in  Example  2.  The  Lewis  number  L  has  been 
set  to  unity  and.  in  this  case,  the  mass  fraction  u  =  1  +  (1  -  T)/  a. 

We  solved  (21)  with  «  =  1.  t>  =  20.  and  R  -  5  using  a  spatial  error  tolerance 
of  0.2.  Mesh  refinement  had  to  be  restricted  to  a  maximum  of  two  levels  because  of 
virtual  memory  restrictions  on  our  computing  system.  The  meshes  that  were  created 
at  I  =  0.28G7.  0.2979.  0.3055.  and  1  are  shown  in  Figure  6.  Surface  and  contour 

plots  of  the  calculated  temperatures  at  t  =  0.28674  and  0.3115  are  presented  in  Fig¬ 

ures  7  and  8.  respectively. 

The  temperature  slowly  increases  until  ignition  occurs  at  approximately 
1  =  0.28.  The  temperature  at  the  origin  then  jumps  from  near  unity  to  near  two. 
A  circularly  shaped  reaction  front  forms  and  moves  radially  with  a  speed  of  approxi¬ 
mately  30  towards  the  boundaries.  A  steady  state  is  reached  at  about  t  =  0.32. 
Refinement  is  confined  to  the  vicinity  of  the  reaction  front.  The  results  of  Figures  7 
and  8  show  some  small  oscillations  in  the  temperature  ahead  of  the  reaction  front. 
At  present,  we  are  unsure  if  these  oscillations  are  caused  by  interpolation  inaccura¬ 
cies  in  our  plotting  routines,  inadequate  resolution  of  the  finite  element  solution  due 
to  our  restricting  the  number  of  levels  of  refinement,  or  an  instability  of  the  reaction 
front.  We  plan  to  explore  these  matters  further  using  a  combination  of  numerical 
and  asymptotic  techniques. 

Our  results  on  this  difficult  nonlinear  problem  are  very  encouraging:  however, 
we  anticipate  that  greater  efficiency  could  he  achieved  by  combining  local  mesh 
refinement  with  mesh  moving  as  in  the  one-dimensional  procedures  described  in  Sec¬ 
tion  II. 

IV.  DISCUSSION  OF  RESULTS  AND  CONCLUSIONS.  We  have 
developed  adaptive  local  mesh  refinement  finite  element  procedures  for  solving  vector 
systems  of  parabolic  partial  differential  equations  in  one  and  two  dimensions.  The 
nodal  superconvergence  property  of  the  finite  element  method  on  parabolic  systems 
has  been  used  to  calculate  an  estimate  the  spatial  discretization  error  and  mesh 
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Figure'  6.  Meshe*  that  wore  used  for  Example  3  at  /  =  0.2867  (upper  left). 
0.2979  (upper  right).  0.3055  (lower  left),  and  1  (lower  right). 


motion  has  been  combined  with  local  inesh  refinement  for  one-dimensional  problems. 

Examples  1  and  2  were  designed  to  illustrate  the  performance  of  our  one¬ 
dimensional  procedures  and  to  characterize  the  importance  of  mesh  motion  as  an 
adaptive  technique  relative  to  mesh  refinement.  These  experiments  indicate  that  our 
combination  of  mesh  moving  and  refinement  can  obtain  solutions  with  about  one- 
half  the  total  number  of  space-time  cells  of  calculations  performed  using  only  refine¬ 
ment.  We  emphasize  the  preliminary  nature  of  these  results.  Many  more  experi¬ 
mental  and  theoretical  investigations  will  be  necessary  before  firm  conclusions  can  be 
reached  regarding  the  optimal  combination  of  mesh  motion  and  refinement. 
Appropriate  performance  measures  and  optimality  conditions  are  yet  to  be  specified. 
There  is  also  a  strong  temptation  to  compare  coded  implementations  of  procedures 
and,  at  this  stage,  we  are  interested  in  more  theoretical  bounds  on  an  algorithm's 
performance. 

Comparisons  of  the  exact  and  estimated  errors,  presented  in  Example  1  and  in 
[1-4],  give  us  some  confidence  in  the  accuracy  of  our  error  estimate.  Additionally, 
the  results  of  Example  3  provide  an  indication  of  the  robustness  of  our  methods. 
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This  is  a  difficult  nonlinear  two-dimensional  problem:  yet.  we  solved  it  without  inter¬ 
vention  and  a  priori  knowledge  of  the  solution.  No  special  initial  mesh  was  used, 
fine  meshes  were  automatically  added  to  the  vicinity  of  the  reaction  front,  and  the 
fine  meshes  followed  the  dynamics  of  the  problem.  Indeed,  our  one-  and  two- 
dimensional  techniques,  seem  to  be  well-suited  for  the  automatic  solution  of 
reaction-diffusion  systems. 

Despite  our  preliminary  success,  there  is  a  great  deal  more  that  should  be  done 
to  justify  and  improve  the  performance  of  our  procedures.  As  noted,  rigorous  ana¬ 
lyses  of  the  convergence  of  our  error  estimate  to  the  true  error  have  only  been  done 
for  one-dimensional  linear  parabolic  problems  on  stationary  meshes  (4) .  Dimensional, 
nonlinear,  and  refinement  effects  should  be  included  in  a  complete  analysis.  This  is  a 
difficult  task,  as  very  few  analyses  of  two-dimensional  time  dependent  problems  with 
refinement  have  appeared  in  the  literature. 

Several  computational  procedures  in  our  approach  might  also  be  improved.  For 
example,  a  sparse  Gaussian  elimination  procedure  was  used  to  solve  the  linear  alge¬ 
braic  systems  associated  with  the  temporal  integration  of  (4).  (8).  and  (10)  in  one 
dimension  and  (19)  and  (20)  in  two  dimensions.  The  solution  of  linear  systems  is  a 
significant  part  of  the  total  computational  effort,  and  it  is  possible  that  iterative 
schemes,  such  as  multigrid  methods,  could  substantially  improve  performance  and 
reduce  storage.  Multigrid  iteration  was  used  successfully  in  the  adaptive  PLTMG 
package  for  elliptic  systems  by  Bank  et  al.  [13] . 

We  are  also  studying  the  addition  of  mesh  moving  capabilities  to  our  two- 
dimensional  algorithm,  the  use  of  higher-order  finite  element  approximations,  and 
implementations  of  our  procedures  on  vector  and  parallel  computers.  A  simple, 
stable  and  explicit  mesh  moving  technique,  that  may  be  useful  for  our  purposes,  was 
developed  by  Arney  and  Flaherty  (5)  for  two-dimensional  hyperbolic  systems.  This 
procedure  dramatically  reduced  errors  and  enhanced  the  resolution  of  their  solutions 
(cf.  Arney  and  Flaherty  [6]).  We  are  developing  procedures  for  two-dimensional 
parabolic  problems  that  use  piecewise  biquadratic  finite  element  approximations  as 
solution  spaces  and  piecewise  cubic  approximations  as  error  estimates.  Babuska  [7] 
has  shown  that  the  error  associated  with  even-degree  polynomial  finite  element 
approximations  for  elliptic  problems  is  principally  due  to  the  error  in  the  interior  of 
the  element.  Thus,  the  error  on  element  boundaries  may  be  neglected.  Babuska  and 
Yu  ( 1 1  ]  have  implemented  procedures  for  elliptic  systems  based  on  this  theory  and 
we  are  studying  their  utility  for  parabolic  problems.  Finally,  our  tree  structure  is 
well-suited  for  parallel  computation  and  we  are  exploring  its  use  on  a  variety  of 
parallel  computing  systems. 
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ABSTRACT .  Superconvergence  properties  and  quadratic  polynomials  are  used 
to  derive  a  computationally  inexpensive  approximation  to  the  spatial  component 
of  the  error  in  a  piecewise  linear  finite  element  method  for  one-dimensional 
parabolic  partial  differential  equations.  This  technique  is  coupled  with  time 
integration  schemes  of  successively  higher  orders  to  obtain  an  approximation  of 
the  temporal  and  total  discretization  errors.  Computational  results  indicate 
that  these  approximations  converge  to  the  exact  discretization  errors  as  the 
mesh  is  refined.  The  approximate  errors  are  used  to  control  an  adaptive  mesh 
refinement  strategy. 

JL _ INTRODUCTION.  Adjerid  rnd  Flaherty  [1,2]  developed  an  a  posteriori 

estimate  of  the  spatial  discretization  error  in  a  finite  element  method  of  lines 
for  solving  vector  systems  of  parabolic  partial  differential  equations.  They 
discretized  the  system  in  space  using  Galerkin's  method  with  piecewise  linear 
finite  element  approximations.  The  error  estimate  was  calculated  using 
Galerkin's  method  with  piecewise  quadratic  functions.  A  nodal  superconvergence 
property  of  the  finite  element  method  was  used  to  neglect  errors  at  nodes  and, 
thus,  improve  computational  efficiency.  Ordinary  differential  equations  (ODEs) 
for  the  finite  element  solution  and  error  estimate  were  then  integrated  in  time 
using  the  backward  difference  code  OASSL  [3]. 

Adjerid  and  Flaherty  [1,2]  assumed  that  the  temporal  discretization  error 
associated  with  OASSL  was  negligible  compared  to  the  spatial  error.  Thus,  their 
estimate  of  the  spatial  discretization  error  could  be  regarded  as  an  estimate  of 
the  total  error.  They  used  their  error  estimate  to  control  mesh  moving  and 
local  mesh  refinement  procedures  that  attempted  to  equidistribute  the  error 
estimate  and  satisfy  a  prescribed  global  error  tolerance.  Similar  mesh  refine¬ 
ment  strategies  have  been  used  by  Bieterman  and  Babuska  [4,5]. 

Our  goal  is  to  develop  techniques  that  simultaneously  estimate  the  temporal 
and  spatial  discretization  errors.  To  this  end,  we  consider  M-dimensional  par¬ 
tial  differential  systems  of  the  form 

ut(x,t)  +  f(x,t,u,ux)  =  (D(x,t)ux(x,t) )x  ,  a<x<b,  t>0,  (la) 
subject  to  the  initial  conditions 

u(x,0)  =  u°(x)  ,  a  <  x  <  b  ,  (lb) 


and  boundary  conditions 


AL(t)u(a,t)  ♦  B(_ ( t ) ux ( a , t )  =  0L ( t )  , 

Ap ( t )u(b, t )  ♦  #R(t)Ux(b,t)  =  Qr ( t )  ,  t  >  0  . 


(lc) 
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The  variables  x  and  t  represent  spatial  and  temporal  coordinates  and  denote  par¬ 
tial  differentiation  when  they  are  used  as  subscripts;  u,  f,  u°,  Q|_,  and  qr  are 
M-vectors;  and  0,  Al,  B|_,  Ar,  and  Bp  are  M  x  M  matrices. 

We,  like  Adjerid  and  Flaherty  [1,2],  discretize  Eq.  (1)  in  space  using 
Galerkin's  method  with  piecewise  linear  finite  elements.  Temporal  discretiza¬ 
tion,  however,  is  performed  by  the  backward  Euler  method  as  opposed  to  using  an 
ODE  code.  A  second  solution  is  calculated  using  trapezoidal  rule  integration  in 
time  and  the  difference  between  the  two  solutions  is  used  to  furnish  an  estimate 
of  the  temporal  discretization  error.  A  third  solution  is  obtained  using 
Adjerid  and  Flaherty's  [1,2]  quadratic  finite  elements  and  the  trapezoidal  rule 
in  time.  This  solution  is  higher  order  in  space  and  time  than  the  original 
piecewise  linear  finite  element-backward  Euler  solution.  Hence,  it  can  be  used 
to  provide  an  estimate  of  the  total  discretization  error  of  the  piecewise  linear 
finite  element-backward  Euler  solution.  Furthermore,  the  difference  between  the 
Diecewise  linear  and  quadratic  solutions  calculated  by  the  trapezoidal  rule  can 
be  used  to  furnish  an  estimate  of  the  spatial  discretization  error. 

At  first  sight,  the  above  procedure  seems  to  be  very  expensive;  however, 
nodal  superconvergence  significantly  reduces  computational  complexity.  Defect 
correction  methods  can  also  b*  used  to  reduce  costs  associated  with  the  temporal 
integration. 

The  estimates  of  the  temporal,  spatial,  and  total  discretization  errors  of 
the  piecewise  linear  finite  element -backward  Euler  solution  are  used  to  control 
a  global  refinement  procedure  that  attempts  to  keep  an  estimate  of  the  total 
discretization  error  per  time  step  in  H*  below  a  prescribed  limit.  Depending  on 
the  proportions  of  the  temporal  and  spatial  error  estimates  to  the  total  error 
estimate,  we  refine  the  time  step,  finite  element  mesh,  or  both. 

The  piecewise  linear  and  quadratic  finite  element  procedures  and  the  tem¬ 
poral  integration  schemes  are  described  in  Section  II.  Our  error  estimation 
procedures  are  presented  in  Section  III.  Adjerid  and  Flaherty  [6]  proved  that 
their  spatial  error  estimate  converges  to  thp  exact  error  as  the  mesh  is  refined 
when  temporal  integration  is  exact  for  linear  parabolic  problems.  Similar 
results  have  not  yet  been  established  when  temporal  errors  are  present;  however, 
computational  results  of  Section  III  indicate  that  convergence  of  our  temporal, 
spatial,  and  total  error  estimates  are  likely.  Our  global  refinement  strategy 
is  presented  in  Section  IV  and  it  is  applied  to  an  unstable  heat  conduction 
problem.  Finally,  in  Section  V  we  discuss  our  results  and  suggest  some  future 
investigations. 

II.  DISCRETE  SYSTEM.  We  simplify  the  presentation  slightly  by  assuming 
that  only  Dirichlet  data  is  prescribed;  thus,  Bj_ ( t )  =  BrU)  =  0,  t  >  0,  in  Eq. 
(lc).  A  weak  form  of  Eq.  (1)  is  then  constructed  by  multiplying  Eq.  (la)  by  a 
test  function  v(x,t)eHgl,  integrating  the  result  with  respect  to  x  from  a  to  b, 
and  integrating  the  diffusive  term  by  parts  to  obtain 

(v,ut)  +  (v,f)  +  A(v,u)  =0  ,  t  >  0  ,  for  all  veHo1  •  (2a) 
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The  inner  product  (v,u)  and  strain  energy  A(v,u)  are  defined  as 


(v,u)  =  vTudx  ,  A(v, u)  =  /  vjDuxdx  .  (2b, c) 

a  a 

Functions  v  belonging  to  Hq*  are  required  to  have  finite  values  of  (v,v)  and 
(vx,vx)  and  vanish  at  v  =  a  and  b.  Any  weak  solution  ueHgl  of  Eq.  (2a)  must 
also  satisfy  the  Oirichlet  (essential)  boundary  conditions 

u(a,t)  =  A" 1  (t)gL(t)  ,  u(b,t)  =  ApMt)flR(t),  t  >  0  ,  (2d,e) 

and  initial  conditions  obtained  by  multiplying  Eq.  (lb)  by  v  and  integrating 
with  respect  to  x,  i.e., 

(v,u)  =  (v,u°)  ,  t  =  0  ,  for  all  vcHq1  .  ( 2f ) 

A  discrete  version  of  the  weak  system,  Eq.  (2),  is  constructed  by  using 
finite  element-Galerkin  procedures  in  space  (cf.  Section  II. 1)  and  finite  dif¬ 
ference  techniques  in  time  (cf.  Section  II. 2). 

II. 1.  Spatial  Discretization.  In  order  to  discretize  Eq.  (2a)  in  space, 
we  introduce  a  partition 

nN  :=  {a  =  x0  <  xj  <  . . .  <  xN  =  bj  (3) 

of  (a,b)  into  N  subintervals  (x-j_i  ,x-j ) ,  i  -  1,2,...,N,  and  approximate  u  and  v 

by  piecewise  polynomial  functions  U  and  V,  respectively,  with  respect  to  this 
partition.  Thus,  the  spatial ly-discrete  form  of  Eq.  (2)  consists  of  finding 
UeSgNcHe1  such  that 

(V,Ut)  +  (V,f)  +  A ( V , U )  =0  ,  t  >  0  ,  for  all  VcS0NcH0l  ,  (4a) 

(V,U)  =  (V,u°)  ,  t  *=  0  ,  for  all  VeS0NcH0l  .  (4b) 

The  spaces  S^N  and  Sqn  will  be  chosen  to  consist  of  either  piecewise  linear 
or  piecewise  quadratic  polynomial  functions.  The  spaces  of  piecewise  linear 
polynomials  are  denoted  as  and  SqN'1  and  are  easily  constructed  in  terms 

of  the  familiar  "hat"  functions 

x  -  xiM 

- - -  ,  X^.j  *  x  <  X-j 

xi  -  xi-l 

Xi+1  -  X 

.  .  Xi  ^  x  <  Xi+1  '  i  =  0,1, ... ,N  .  (5) 

xi+l  "  xi 

0  ,  otherwise 

The  piecewise  linear  finite  element  solution  is  written  in  the  form 

N 

Ul(x,t)  =  J  Ci(t)0i(x)  (6) 

i=0 
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and  determined  by  solving  the  ordinary  differential  system 


(Vl  ,u!t )  +  (V*,f)  +  AIV^U1)  =0  ,  t  >  0  ,  for  all  vUSqN*!  ,  (7a) 

(V1^1)  =  (Vl(u°)  ,  t  =  0  ,  for  all  vicSoN'l  ,  (7b) 

where  the  piecewise  linear  test  functions  V*eSoN'l  have  a  form  similar  to  Eq. 

(6). 


Piecewise  quadratic  approximations  U2eSpN'2  are  constructed  by  adding  a 
"hierarchical"  correction  E2(x,t)  to  U1,  i.e., 


U2(x,t)  =  Ul(x,t)  +  E2 (x , t ) 


where 


E2(x,t)  *  J  d-j-jj ( t )g/i_J<(x)  . 
i*l 

The  basis  «j/,_^(x),  i  =  1,2,...,N,  for  the  quadratic  correction  has  the  form 


(8a) 


(8b) 


-•> 


f'i-Ji(x)  = 


X  -  X.j_l  X-j  -  X 

( - )( - 

*i  -  *i-l  *i  -  *i-l 


)  ,  X^J  $  X  $  Xy 


.  *  =  1.2 . N  .  (9) 

0  ,  otherwise 

Piecewise  quadratic  solutions  are  determined  by  solving 

(V2.U2t)  +  (V2,f)  +  A(V2,U2)  =0  ,  t  >  0  ,  for  all  V2eS0N'2  ,  (10a) 
(V2,U2)  =  (V2,u°)  ,  t  *  0  ,  for  all  V2eS0N*2  , 
where,  once  again,  V2  has  a  form  similar  to  Eq.  (8). 


(10b) 


II. 2.  Temporal  Discretization.  The  finite  element  systems,  Eqs.  (4),  (7), 
or  (10),  are  discretized  in  time  for  the  time  step  [tn_j,tn]  using  a  weighted 
two-step  method,  which  for  Eq.  (4)  has  the  form 

yn  _  yn-1 

(V,  — . )  ♦  8[  (V,fn )  +  A(V,Un)  ]  + 

Atn 

(1-8) t (V, fn-l )  +  A(V,Un"l)3  =0  ,  for  all  VeS0N  .  (11) 

The  scalar  parameter  8  is  selected  on  [0,1],  Un(x)  :=  U(x,tn),  etc.,  and  Atn  := 
tn  -  tn_j.  For  simplicity,  the  test  function  V  has  been  assumed  to  be  independ¬ 
ent  of  time,  although  this  will  not  be  strictly  correct  when  refinement  is 
incorporated  into  the  finite  element  method  (cf.  Section  IV). 
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The  two  particular  choices  of  0  that  are  appropriate  for  our  investigation 
are  0*1,  which  yields  the  backward  Euler  method,  and  0  *  which  yields  the 
trapezoidal  rule.  It  is  well  known  [7]  that  the  local  discretization  error  of 
the  backward  Euler  method  is  0(Atn2)  and  that  of  the  trapezoidal  rule  is 
0(Atn3).  We  will  use  this  difference  in  the  orders  of  accuracy  of  the  two 
methods  to  estimate  the  local  temporal  discretization  error  of  the  finite  ele¬ 
ment  solution. 

III.  ERROR  ESTIMATION.  Local  and  global  estimates  of  the  discretization 
error  have  been  successfully  used  to  control  refinement  algorithms  that  attempt 
to  solve  partial  differential  systems  to  prescribed  levels  of  accuracy 
[1,2,4-6,8-12].  Our  goal  is  to  estimate  the  discretization  error  per  time  step 
in  solutions  of  Eq.  (2)  obtained  by  using  piecewise  linear  finite  element 
approximations  in  space  and  the  backward  Euler  method  in  time.  It  seems  most 
appropriate  to  gage  errors 


e  :*  u  -  U 


(12) 


in  the  H*  norm 

He  Hi  :*  [/b(tje  ♦  •Tm)dx]J*  (13) 

a  *  x 

however,  other  measures  may  also  be  used.  An  error  estimate  that  is  global  in 
space  and  local  in  time  may  at  first  seem  unusual,  but  it  is  commonly  used  when 
spatial  finite  element  approximations  are  combined  with  temporal  finite  dif¬ 
ference  methods  (cf.,  e.g.,  Thomee  [13]). 

Let  the  piecewise  linear  finite  element  solution  obtained  by  using  backward 
Euler  temporal  integration  be  denoted  as  UBE1,n(x)  at  time  tn.  Likewise,  let 
UT1'n(x)  and  Uj2'n(x)  denote  solutions  obtained  at  tn  by  trapezoidal  rule 
integration  with  piecewise  linear  and  quadratic  approximations,  respectively. 

It  is  known  [14]  that  Hu(*,tn)  -  UBE*'n(,)*l  *  0(Atn2)  +  0(Atn/N).  Since 
tlu ( • ,  tn )  -  Uj2,n||^  -  o(Atn3)  +  0(Atn/N2),  we  should  be  able  to  use  the  dif¬ 
ference  between  Uj2,n  and  Ube*'0  to  estimate  the  error  in  UgE^'n;  thus, 

II u  -  Ube^'0*!  4  IUj2«n  -  Ube*'^!  +  Nu  -  UT^'n*l 

4  «UT2,n  _  Ube1'0*!  ♦  0( Atn9 )  +  0(Atn/N*)  .  (14) 

The  main  problem  in  using  IUj2,n  -  UBE*'n*i  as  an  a  posteriori  estimate  of  Du  - 
UBE^'n,,i  is  the  computational  effort  required  to  obtain  Ux2,n.  This  cost  can  be 
reduced  considerably  by  using  the  superconvergence  property  of  the  finite  ele¬ 
ment  method  for  one-dimensional  parabolic  systems.  In  the  present  context, 
superconvergence  implies  that  finite  element  solutions  converge  at  a  faster  rate 
on  than  elsewhere  on  (a,b).  Hence,  the  error  at  the  nodes  may  be  neglected 
relative  to  the  error  in  the  interior  of  the  elements  when  N  is  sufficiently 
large. 

Nodal  superconvergence  has  been  used  by  several  investigators  as  a  means 
of  constructing  a  posteriori  error  estimates  in  finite  element  approximations. 
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In  particular,  Adjerid  and  Flaherty  [1,2]  used  it  in  conjunction  with  their 
adaptive  finite  element  method  of  lines.  Their  situation  was  somewhat  more 
restrictive  than  ours  as  they  also  required  the  temporal  error  to  be  negligible 
relative  to  the  spatial  error. 

The  use  of  the  nodal  superconvergence  property  enables  us  to  approximate 

Uj2,n  as 


UT2,n  -  uTl.n  +  Ej2,n  (15) 

where  Ufl'n  is  obtained  by  solving  Eq.  (7)  using  trapezoidal  rule  integration 
and  Ej2'n  is  obtained  by  solving  Eq.  (10a)  by  trapezoidal  rule  integration  with 
U2  replaced  by  Eq.  (15).  Furthermore,  it  is  only  necessary  to  test  Eq.  (10a) 
against  functions  V2£SqN'2,  where  SqN'2  is  a  space  of  quadratic  polynomials  that 
vanish  on  IIN. 

To  summarize,  our  procedure  for  obtaining  the  finite  element  solution 
UbeI • n  and  its  error  estimate  Uy1'0  +  Ej2*n  -  Ube1'0  for  the  time  step  [tn_j,tn] 
consists  of: 


(i)  discretizing  Eq.  (7a)  by  the  backward  Euler  method  and  determining  Ube*' 
as  the  solution  of 

dBE^,n  ~  U0E^'n~^ 

(Vl,  . . )  ♦  (Vl,f(*,tn,UBE1,n) 

Atn 

+  A(Vl,UBEl*n)  =  0  ,  for  all  vleS0N.l  ,  (16a) 

(ii)  discretizing  Eq.  (7a)  by  the  trapezoidal  rule  and  determining  Uj1'0  as 
the  solution  of 

UTl.n  -  Ube1'0-1 

(vi,  - - - )  +  HHvi.fKtn.Ur1'0)  ♦  A(vi,Uri^) 

Atn 

♦  (W  ,f  ( • ,  tn-! ,UBe* ,n-1 ) )  +  A(Vl,UBEa'n"1)]  *  0  .  for  all  vIcSqN.I,  (16b) 


(iii)  discretizing  Eq.  (10a)  by  the  trapezoidal  rule  and  determining  Ej2'n  as 
the  solution  of 

Uyl»n  ♦  Ej2,n  ~  Ube^*^”^  ■  Ey2*n_l 

(v2. . . . . 

♦  *[(V2,f<«,tn,UTl'n  +  Ej2'"))  ♦  A(V2,Url'n+ET2'n) 

♦  (V2,f ( • ,tn_i ,UBE^'n"^+ET2*n“l ) )  ♦  A(V2,UBE^'n_^+ET^'n”')l  *  0  > 


Temporal  error  estimation  is  local;  thus,  we  use  Ube1'0-*  as  an  initial 
condition  for  the  trapezoidal  rule  integrations  in  Eqs.  (16b)  and  (16c).  Nodal 
superconvergence  and  the  hierarchical  formulation  has  uncoupled  the  piecewise 
linear  and  quadratic  components  of  Uj2'0.  The  spatial  error  estimate  Ey2'11  on 
the  subinterval  (x^_j,xi)  is  furthermore  uncoupled  from  the  error  on  other 
subintervals  and  this  significantly  reduces  the  computational  complexity  asso¬ 
ciated  with  solving  Eq.  (16c).  The  solution  of  Eq.  (16b),  noted  in  step  (ii), 
is  necessary  in  order  to  increase  the  temporal  accuracy  of  the  solution  because 
superconvergence  only  increases  the  order  of  accuracy  in  space.  SomA  com¬ 
putational  savings  can  generally  be  obtained,  especially  for  nonlinear  problems, 
by  calculating  Uyl'11  as  a  defect  correction  to  the  backward  Euler  solution 
UbE*'0* 


As  described  above, 

£n  .*  |uTl»n  +  ET2*n  -  UBE1'nll1  (17) 

furnishes  an  estimate  to  the  error  Hu  -  UBjrl'nHi  of  the  backward  Euler  solution. 
Equation  (17)  suggests  the  inequality 

en  <  HUjl * n  -  UbeI^Hj  ♦  IIET2'nll j  .  (18) 

The  term  BUt  1 » n  -  UBE*'nRi  is  the  difference  between  two  piecewise  linear  solu¬ 
tions  computed  with  temporal  integration  schemes  of  different  orders  and  can  be 
regarded  as  a  measure  of  the  temporal  discretization  error.  In  a  similar 
manner,  ME-j-2 » nll j  can  be  regarded  as  a  measure  of  the  spatial  discretization. 
Indeed,  when  the  finite  element  system,  Eq.  (7),  is  integrated  exactly  in  time, 
Adjerid  and  Flaherty  [6]  proved  that  IE2Ij  converges  to  the  exact  spatial 
discretization  error  Hu  -  Ulltj  as  N  -  <*>  for  linear  parabolic  problems. 

We  conclude  this  section  by  presenting  an  example  that  indicates  that  en, 

H Uf  1  * n  -  UBE^,nl|i»  and  HET2'nHi  provide  good  estimates  of  the  total,  temporal, 
and  spatial  discretization  errors,  respectively. 

Example  1:  Consider  the  linear  heat  conduction  problem 


ut  «  uxx/x2  ,  0  <  x  <  1  ,  t  >  0  ,  (19a) 

u(x,0)  »  sin  irx  ,  0  <  x  <  1  ,  (19b) 

u(0,t)  =  u(l,t)  *0  ,  t  >  0  .  (19c, d) 

The  exact  solution  of  this  simple  problem  is 

u(x,t)  =  e~*  sin  nx  (20) 


We  solved  Eq.  (19)  on  a  uniform  mesh  with  N  finite  elements  for  one  time 
step  At  using  the  methods  described  above  and  several  choices  of  N  and  At.  The 
effect ivity  index 


0  :=  eVlu(  •  ,At)  -  VbeI'IHj 


(21) 
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(cf.,  e.g.,  Babuska_et  al.  [15]),  is  used  as  a  means  of  gaging  the  accuracy  of 
the  error  estimate  el.  Ideally,  we  would  like  0  not  to  differ  appreciably  from 
unity  and  to  approach  unity  as  N  -  ®  and  At  -  0. 

We  present  a  summary  of  results  for  the  reciprocal  of  the  effectivity  index 
for  a  sequence  of  calculations  performed  with  N  =  2P+1  and  At  ■  5“P/2,  p  * 
0,1,..., 5,  in  Figure  1.  These  results  strongly  suggest  that  0  -  1  as  p  *  •. 

We  use  the  temporal  effectivity  index 

0^  :*  HUjl'l  -  Uggl ' ^ H 1  / Hu ( • , At )  -  UbE^'^®1  (22) 

as  a  method  of  appraising  the  accuracy  of  the  temporal  error  estimate  HUt1*1 

-  Ube1,1|1-  For  fixed  At,  0t  -  Kt(At)  as  N  -  <*>  and  the  limiting  value  K^(At)  -* 

1  as  At  -  0. 

We  solve  Eq.  (19)  for  a  single  time  step  using  a  sequence  of  meshes  with  N 
=  2P,  p  =  3, 4,..., 10,  finite  elements  and  time  steps  of  At  =  0.7,  0.49,  0.343, 
and  0.2401.  We  present  our  findings  for  the  temporal  effectivity  index  0t  as  a 
function  of  p  for  the  four  time  steps  in  Figure  2.  As  expected,  et  tends  to  a 
limiting  value  Kt(At)  for  large  N,  which  approaches  unity  as  At  -  0. 

Finally,  we  define  the  spatial  effectivity  index  as 

0S  :=  MET2'  ln1/llu(- ,At)  -  Ujje1 ' 1 II i  (23) 

and  use  it  as  a  measure  of  the  spatial  error  estimate  >Et2 ' 1 H i .  For  fixed  N,  0S 

-  Ks(N)  as  At  -  0  and  the  limiting  value  KS(N)  -  1  as  N  -  •*>. 

Again,  we  solve  Eq.  (19)  and  present  results  for  the  reciprocal  of  the  spa¬ 
tial  effectivity  index  as  a  function  of  At  »  5"P/2,  p  =  1,2,..., 7,  for  meshes 
with  N  =  2,  4,  and  8  finite  elements.  These  results  suggest  that  for  a  fixed  N, 
0S  *  KS(N)  as  p  *  <*>,  and  that  KS(N)  is  reasonably  close  to  unity.  Futhermore, 
it  appears  that  KS(N)  -  1  as  N  increases. 

IV.  HESH  REFINEMENT.  The  error  estimates  developed  in  Section  III  are 

used  to  control  a  simple  global  mesh  refinement  procedure  that  keeps  en  below  a 

specified  tolerance  T0L.  Suppose  that  a  solution  Uggl'n"l  and  error  estimates 
ET2,n-l  an(j  en-l  have  been  calculated  at  time  tn.j  using  a  mesh  with  N  elements 
and  time  step  Atn_j.  Further  suppose  that  en_1  <  T0L  and  calculate  solutions 
and  error  estimates  at  time  tn  =  tn_i  ♦  Atn  using  a  mesh  with  N  elements  and 
time  step  Atn  *  Atn_j.  Our  refinement  strategy  consists  of  checking  en  and  pro¬ 
ceeding  as  follows: 

(i)  if  en  (  T0L,  continue  to  the  next  time  step; 

(ii)  if  en  >  T0L,  and  0.3  <  IET2,n|/en  <  o.7,  double  N,  reduce  Atn  by  thirty 
percent,  and  redo  the  integration; 

(iii)  if  en  >  T0L  and  0.7  $  nE-^.ni/en,  double  N  and  redo  the  integration; 
and 


1106 


(iv)  if  en  >  TOL  and  HE-j-2 ' nM/en  4  0.3,  reduce  Atn  by  thirty  percent  and  redo 
the  integration. 

Steps  (ii)  through  (iv)  are  repeated  until  step  (i)  is  satisfied. 

The  main  advantage  of  this  refinement  procedure  is  that  the  separate  esti¬ 
mates  of  the  spatial  and  temporal  errors  allow  different  strategies  to  be  used 
depending  upon  the  dominant  component  of  the_error.  Thus,  if  the  spatial  com¬ 
ponent  of  the  error,  as  measured  by  IE-|-2,n||/en  is  large,  then  only  spatial 
refinement  is  used  to  reduce  the  total  error.  The  opposite  situation  arises 
when  the  spatial  component  of  the  error  is  small. 

It  is  important  to  note  that  the  error  estimates  used  in  the  refinement 
procedure  are,  at  best,  only  asymptotically  correct.  Thus,  they  will  not  pro¬ 
duce  reliable  estimates  on  coarse  meshes  or  when  errors  are  large.  With  this  in 
mind,  it  may  be  best  to  replace  en  by  N U-j- 1  * n  -  « n H j  +  l6x2»nllj  in  the 

refinement  procedure. 

The  specific  choice  of  the  limiting  values  0.3  and  0.7  that  are  used  to 
determine  the  dominant  component  of  the  error  in  our  refinement  procedure  are 
basically  arbitrary.  Under  normal  circumstances,  the  spatial  error  measure 
II Ey 2 , n h /e ne (0,1) ;  thus,  it  is  reasonable  to  divide  (0,1)  approximately  into 
thirds,  i.e.,  (0,0.3),  (0.3, 0.7),  and  (0.7,1)  corresponding,  respectively,  to 
only  temporal  refinement,  temporal  and  spatial  refinement,  and  only  spatial 
refinement.  This  strategy  may  not  be  appropriate  in  all  situations  and  further 
analysis  and  experimentation  is  needed  to  determine  optimal  refinement  criteria. 

A  local  refinement  strategy,  such  as  those  considered  in  [2,4-6,8-12],  is 
usually  more  efficient  than  the  global  strategy  presented  herein.  Our  plans  are 
to  combine  refinement  with  a  mesh  moving  method  that  equidistributes  a  global 
error  measure  on  a  mesh  with  a  fixed  number  of  finite  elements  [16,17].  It  may 
be  possible  to  use  a  simple  global  refinement  strategy  in  conjunction  with  such 
a  mesh  moving  method  since  the  local  error  measure  will  be  approximately  the 
same  on  every  subinterval. 

Doubling  the  number  of  finite  elements  whenever  spatial  refinement  is  per¬ 
formed  simplifies  interpolation  issues,  but  may  add  more  nodes  than  necessary. 
Reducing  the  time  step  by  thirty  percent  keeps  temporal  accuracy  comparable  to 
spatial  accuracy,  since  the  temporal  convergence  rate  is  0(Atn*),  while  the  spa¬ 
tial  convergence  rate  is  0(1/N)  (^cf.  Section  III).  Thus,  doubling  N  would 
correspond  to  reducing  Atn  by  \/V2,  which  is  approximately  thirty  percent. 

Me  apply  the  above  refinement  procedure  to  the  following  singular  parabolic 
problem. 

Example  2:  Consider  the  partial  differential  system 

ut  ♦  u/[2(l-t)]  *  -uxx/4  »  0  <  x  <  on  ,  0<t<l,  (24a) 

u(x,0)  ■  e~*2  ,  0  4  x  <  <t>  ,  (24b) 

ux(0,t)  =  lim  ux(s,t)  «  0  ,  0  <  t  <  1  .  (24c, d) 

S-*<x> 


1107 


(25) 


The  exact  solution  of  this  problem  is 

u(x,t)  =  e-x2/(l“t)  . 

This  problem  was  motivated  by  our  interest  in  solving  the  nonlinear 
Schrodinger  equation  in  cylindrical  coordinates  [18].  It  is  known  [19]  that  the 
solution  of  the  Schrodinger  equation  can  "self-focus,"  i.e.,  its  solution  can 
become  infinite  at  one  point,  while  decaying  elsewhere  on  the  domain.  Problems 
of  this  type  occur  in  laser  optics. 

Such  problems  are  difficult  to  solve  by  traditional  numerical  methods  and 
illustrate  the  need  for  adaptive  strategies.  The  model  given  by  Eq.  (24)  was 
developed  as  a  simple  approximation  of  the  behavior  of  the  Schrodinger  equation. 
Its  solution  "focuses"  in  the  sense  that  u(x,t)  -  0,  x  >  0,  as  t  -  1,  while 
u(0, t)  =  1. 

This  problem  was  solved  for  values  of  TOL  of  0.2,  0.1,  and  0.05  and  a  sum¬ 
mary  of  the  results  are  presented  in  Tables  1,  2,  and  3,  respectively.  These 
tables  present  the  relevant  refinement  data  for  each  time  step  of  the  solution 
process.  The  time  and  numerical  parameters  At  and  N  at  the  beginning  of  a 
solution  step  are  found  in  the  columns  labelled  "Initial  Time,"  "Initial  At," 
and  "Initial  N,"  respectively.  The  refined  values  of  At  and  N,  necessary  to 
complete  the  solution  step  within  the  given  tolerance,  are  found  in  the  columns 
labelled  "Refined  At,"  and  "Refined  N,"  respectively.  The  resulting  time  at  the 
end  of  a  solution  step  is  found  in  the  column  labelled  "Final  Time."  The  last 
column,  labelled  "Total  Error  Estimate,"  lists  the  value  of  en  at  the  successful 
completion  of  a  time  step.  The  rows  of  each  table  outline  the  solution  process 
as  it  advances  through  time. 

These  results  indicate  that  it  is  sometimes  possible  to  reduce  the  total 
error  by  refining  only  in  space  or  only  in  time,  and  that  the  error  estimates 
en,  IIUTl»n  -  UgE1 » n *2 #  and  WEy2 , nn ^  can  be  used  to  detect  when  these  situations 
arise. 

TABLE  1.  NUMERICAL  PARAMETERS  AT  THE  BEGINNING  AND  THE  END  OF  EACH  TIME  STEP 
FOR  SOLVING  EXAMPLE  2  WITH  TOL  *  0.2 


Initial 

T  ime 

Initial 

N 

Refined 

At 

Refined 

N 

Final 

Time 

Total  Error 
Estimate 

0.0000 

0.1250 

16 

0.1250 

16 

0.1250 

0.0885 

0.1250 

0.1250 

16 

0.1250 

16 

0.2500 

0.1496 

0.2500 

0.1250 

16 

0.0875 

32 

0.3375 

0.1062 

0.3375 

0.0875 

32 

0.0875 

32 

0.4250 

0.1931 

0.4250 

0.0875 

32 

0.0613 

64 

0.4863 

0.1381 

0.4863 

0.0613 

64 

0.0429 

128 

0.5291 

0.0782 

0.5291 

0.0429 

128 

0.0429 

128 

0.5720 

0.1144 

0.5720 

0.0429 

128 

0.0429 

128 

0.6149 

0.1404 

0.6149 

0.0429 

128 

0.0429 

128 

0.6578 

0.6578 

0.0429 

128 

0.0300 

256 

0.6878 
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TABLE  2.  NUMERICAL  PARAMETERS  AT  THE  BEGINNING  AND  THE  END  OF  EACH  TIME  STEP 
FOR  SOLVING  EXAMPLE  2  WITH  TOL  =  0.1 


Initial 

Time 

Initial 

At 

BH9 

Refined 

At 

Refined 

N 

Final 

Time 

Total  Error 
Estimate 

0.0000 

0.1250 

16 

0.1250 

16 

0.1250 

0.0885 

0.1250 

0.1250 

16 

0.0875 

32 

0.2125 

0.0496 

0.2125 

0.1250 

32 

0.0875 

32 

0.3000 

0.0615 

0.3000 

0.0875 

32 

0.0613 

64 

0.3613 

0.0390 

0.3613 

0.0613 

64 

0.0613 

64 

0.4225 

0.0549 

0.4225 

0.0613 

64 

0.0429 

64 

0.4654 

0.0604 

0.4654 

0.0429 

64 

0.0300 

128 

0.4954 

0.0340 

0.4954 

0.0300 

128 

0.0300 

128 

0.5254 

0.0528 

0.5254 

0.0300 

128 

0.0300 

128 

0.5554 

0.0847 

0.5554 

0.0300 

128 

0.0210 

128 

0.5764 

0.0782 

TABLE  3.  NUMERICAL  PARAMETERS  AT  THE  BEGINNING  AND  THE  END  OF  EACH  TIME  STEP 
FOR  SOLVING  EXAMPLE  2  WITH  TOL  *  0.5 


Initial 

Time 

Initial 

At 

Initial 

N 

Refined 

At 

Refined 

N 

Final 

T  ime 

Total  Error 
Estimate 

0.0000 

0.1250 

0.1250 

32 

0.1250 

0.0451 

0.1250 

0.1250 

0.0875 

64 

0.2125 

0.0261 

0.2125 

0.0875 

0.0875 

64 

0.0377 

0.3000 

0.0875 

64 

0.0613 

64 

0.3613 

0.0375 

0.3613 

0.0613 

64 

0.0429 

128 

0.4041 

0.0211 

0.4041 

0.0429 

128 

0.0429 

128 

0.4470 

0.0300 

0.4470 

0.0429 

128 

0.0300 

256 

0.4470 

0.0176 

0.4470 

0.0300 

256 

0.0300 

256 

0.5070 

0.0246 

0.5070 

0.0300 

256 

0.0210 

256 

0.5280 

0.0202 

0.5280 

0.0210 

256 

0.0210 

256 

0.5490 

0.0272 

V.  DISCUSSION.  We  developed  methods  for  calculating  a  posteriori  esti¬ 
mates  of  the  total,  spatial,  and  temporal  discretization  errors  when  a  vector 
system  of  parabolic  partial  differential  equations  is  solved  using  piecewise 
linear  finite  elements  in  space  and  the  backward  Euler  method  in  time.  The 
error  estimates  are  obtained  by  using  higher  order  methods,  with  nodal  supercon¬ 
vergence  being  used  to  improve  computational  efficiency. 

The  three  estimates  were  used  to  control  a  global  refinement  procedure  that 
attempts  to  keep  a  global  measure  of  the  error  per  time  step  below  a  prescribed 
tolerance.  Refinement  can  be  performed  in  space,  time,  or  both  space  and  time 
depending  on  the  dominant  component  of  the  error  estimate. 

Comparison  of  the  exact  and  estimated  errors,  presented  in  Example  1,  give 
us  some  confidence  in  the  accuracy  of  our  error  estimates.  Additionally,  the 
results  of  Example  2  provide  an  indication  of  the  utility  of  these  estimates  in 
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an  adaptive  procedure.  In  certain  situations,  only  spatial  or  temporal  refine¬ 
ment  was  needed  to  keep  the  total  error  within  the  prescribed  tolerance  and  our 
error  estimates  could  be  used  to  determine  when  these  situations  arise. 

This  is  the  first  attempt  that  we  know  of  which  simultaneously  addresses 
spatial  and  temporal  errors  with  different  refinement  strategies.  Some 
researchers  [8-11J  have  used  binary  refinement  in  space  and  time,  but  did  not 
attempt  to  determine  the  dominant  component  of  the  total  discretization  error. 
As  noted,  method  of  lines  techniques  [1,2, 4, 5]  typically  assume  that  temporal 
integration  is  exact  and  refine  based  on  estimates  of  spatial  errors.  There  is 
a  great  potential  for  techniques  that  utilize  different  spatial  and  temporal 
refinement  strategies,  particularly  with  problems  having  singularities.  Our 
work,  however,  is  still  very  preliminary  and  there  is  still  a  great  deal  to  be 
done.  Rigorous  convergence  results  for  our  error  estimates  are  yet  to  be 
established.  The  refinement  algorithm  of  Section  IV  is  very  simple  and  will 
likely  benefit  from  further  experimental  and  theoretical  analyses.  We  also 
anticipate  that  the  inclusion  of  a  mesh  moving  procedure  based  on  equidistri- 
buting  a  global  error  measure  [16]  will  dramatically  improve  the  performance  of 
our  adaptive  solution  technique.  In  the  future,  we  would  like  to  extend  our 
techniques  to  multi-dimensional  problems  and  to  consider  higher  order  spatial 
and  temporal  discretization  methods. 
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Figure  1.  Reciprocal  of  the  effectivity  index  9  versus  p,  where  N  =  2P+1 
and  At  =  5“P/2  when  solving  Example  1.  Note  that  0  approaches 
1  as  p  increases. 
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Figure  2.  Temporal  effectivity  index  for  At  =  0.7  (curve  1), 
At  =  0.49  (curve  2),  At  *  0.343  (curve  3),  and  At  = 
0.2401  (curve  4),  versus  p,  where  N  =  2P  when  solving 
Example  1.  Note  that  approaches  1  as  At  decreases. 
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ABSTRACT.  We  discuss  mesh  moving,  static  mesh  regeneration,  and  local  mesh 
refinement  algorithms  that  can  be  used  with  a  finite  difference  or  finite  element  scheme  to 
solve  initial-boundary  value  problems  for  vector  systems  of  time-dependent  partial 
differential  equations  in  two  space  dimensions  and  time.  A  coarse  base  mesh  of  quadrila¬ 
teral  cells  is  moved  by  an  algebraic  mesh  movement  function  so  as  to  follow  and  isolate 
spatially  distinct  phenomena.  The  local  mesh  refinement  method  recursively  divides  the 
time  step  and  spatial  cells  of  the  moving  base  mesh  in  regions  where  error  indicators  are 
high  until  a  prescribed  tolerance  is  satisfied.  The  static  mesh  regeneration  procedure  is 
used  to  create  a  new  base  mesh  when  the  existing  ones  becomes  too  distorted 

In  order  to  test  our  adaptive  algorithms,  we  implemented  them  in  a  system  code  with 
an  initial  mesh  generator,  a  MacCormack  finite  difference  scheme  for  hyperbolic  systems, 
and  an  error  indicator  based  upon  estimates  of  the  local  discretization  error  obtained  by 
Richardson  extrapolation.  Results  are  presented  for  several  computational  examples. 

I.  INTRODUCTION.  Many  initial-boundary  value  problems  for  time-dependent 
partial  differential  equations  involve  fine-scale  structures  that  develop,  propagate,  decay, 
and/or  disappear  as  the  solution  evolves.  Some  examples  are  shock  waves  in  compressible 
flows,  boundary  and  shear  layers  in  viscous  flows,  and  reaction  zones  in  combustion 
processes.  The  numerical  solution  of  these  problems  is  usually  difficult  because  the 
nature,  location,  and  duration  of  the  structures  are  often  not  known  in  advance.  Thus, 
conventional  numerical  approaches  that  calculate  solutions  on  a  prescribed  (typically  uni¬ 
form)  mesh  often  fail  to  adequately  resolve  the  fine-scale  phenomena,  have  excessive 
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Office  under  Contract  Number  DAAL  03-86-K-0112.  This  work  was  used  to  partially  fulfill  the  Ph.D.  re¬ 
quirements  of  the  first  author  at  the  Rensselaer  Polytechnic  Institute. 


1115 


computational  costs,  or  produce  incorrect  results.  Adaptive  procedures  that  evolve  with 
the  solution  offer  a  robust,  reliable,  and  efficient  alternative.  Such  techniques  have  been 
the  subject  of  a  great  deal  of  recent  attention  (cf.  Babuska  et  al.  [7,  9])  and  are  generally 
capable  of  introducing  finer  meshes  in  regions  where  greater  resolution  is  needed  [1,  2,  .3, 
6,  8,  10,  15,  16],  moving  meshes  in  order  to  follow  isolated  dynamic  phenomena  [1,  2,  5, 
21,  23,  24,  25,  30],  or  changing  the  order  of  methods  in  specific  regions  of  the  problem 
domain  [18,  22].  The  utility  of  such  adaptive  techniques  is  greatly  enhanced  when  they 
are  capable  of  providing  an  estimate  of  the  accuracy  of  the  computed  solution.  Local 
error  estimates  are  often  used  as  refinement  indicators  and  to  produce  solutions  that  satisfy 
either  local  or  global  accuracy  specifications  [1,  2,  3,  6,  8,  10,  15,  16].  Successful  error 
estimates  have  been  obtained  using  h-refinement  [6,  15,  16],  where  the  difference  between 
soluti  ns  on  different  meshes  is  used  to  estimate  the  error,  and  p-refinement  [1,  2,  3,  8,  16, 
22]  where  the  difference  between  methods  of  different  orders  are  used  to  estimate  the 
error. 

We  discuss  an  adaptive  procedure  that  combines  mesh  movement  and  local 
refinement  for  m-dimensional  vector  systems  of  partial  differential  equations  having  the 
form 

u,  +  f(x,y,r  ,11,11*  ,uy)  =  [D1(jr,y,r,u)ux]*  +  [D2(jc,y,r,u)uJt]>, 

for  t  >  0,  (x,y)  e  ft,  (la) 

with  initial  conditions 

u(x  ,y  ,0)  =  u°(x  ,y ),  for  (x  ,y )  e  ft  3ft,  (lb) 

and  appropriate  well-posed  boundary  conditions  on  the  boundary  3ft  of  a  rectangular 
region  ft. 

We  suppose  that  a  numerical  method  is  available  for  calculating  approximate  solu¬ 
tions  and  error  indicators  of  (1)  at  each  node  of  a  moving  mesh  of  quadrilateral  cells. 
Any  appropriate  numerical  method  is  applicable  and  the  error  indicator  can  either  be  an 
estimate  of  the  local  discretization  error  or  another  function  (e.g.,  an  estimate  of  the  solu¬ 
tion  gradient  or  curvature)  that  is  large  where  additional  resolution  is  needed  and  small 
where  less  resolution  is  desired.  Our  adaptive  rlgorithm  consists  of  three  main  parts:  (i) 
movement  of  a  coarse  base  mesh,  (ii)  local  refinement  of  the  base  mesh  in  regions  where 
resolution  is  inadequate,  and  (iii)  creation  and  regeneration  of  the  base  mesh  when  it 
becomes  overly  distorted.  Our  experience  (cf.  Section  III)  and  that  of  others  [25]  indicates 
that  mesh  motion  can  substantially  reduce  errors  for  a  very  modest  computational  cost. 
Mesh  motion  alone,  however,  cannot  produce  a  solution  that  will  satisfy  a  prescribed  error 
tolerance  in  all  situations.  For  this  reason,  we  have  combined  mesh  motion  with  local 
mesh  refinement  and  recursively  solve  local  problems  in  regions  where  error  tolerances  are 
not  satisfied.  The  local  solution  scheme  successively  reduces  the  domain  size  and,  thus, 
further  reduces  the  cost  of  the  computation.  Some  problems,  e.g.,  those  with  severe 
material  deformations,  can  result  in  tangling  and  distortion  of  the  moving  base  mesh. 
Therefore,  we  have  created  a  procedure  that  automatically  generates  a  new  base  mesh 
whenever  the  old  one  is  unsuitable. 

The  adaptive  procedures  described  in  this  paper  combine  our  earlier  work  on  mesh 
moving  techniques  [5]  and  local  refinement  procedures  [6].  The  inclusion  of  a  static  mesh 
regeneration  scheme  adds  greater  reliability  and  efficiency  to  these  methods.  The  three 
components  of  our  adaptive  algorithm  are  described  in  Section  II;  however,  frequent  refer¬ 
ences  are  made  to  our  previous  investigations  [5,  6],  A  computer  code  based  on  the  adap¬ 
tive  algorithm  of  Section  II  has  been  combined  with  a  MacCormack  finite  difference 
scheme  and  an  error  indicator  based  on  Richardson  extrapolation.  It  has  been  used  to 
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solve  a  sequence  of  hyperbolic  problems  (i.e.,  problems  having  the  form  (1)  with 
D1  =  D2  :=  0)  and  our  findings  on  three  examples,  where  we  have  attempted  to  appraise 
the  relative  costs  and  benefits  of  the  mesh  moving  and  local  refinement  portions  of  our 
adaptive  algorithm,  are  reported  in  Section  m.  We  have  also  compared  solutions  obtained 
by  adaptive  techniques  to  those  obtained  using  stationary  uniform  meshes.  In  all  three 
examples,  solutions  obtained  by  adaptive  techniques  cost  less  than  solutions  obtained  on 
stationary  uniform  meshes  having  approximately  the  same  accuracy.  The  mesh  moving 
technique  added  approximately  ten  percent  to  the  computational  time  of  the  adaptive  algo¬ 
rithm  and  greatly  improved  the  results.  Most  of  the  computational  time  was  devoted  to 
calculating  the  solution  and  error  indicators,  and  not  to  the  overhead  induced  by  the 
refinement  procedure.  Although  we  are  greatly  encouraged  by  our  results,  our  adaptive 
procedures  are  far  from  complete.  Some  possible  improvements  and  future  considerations 
are  discussed  in  Section  IV. 


II.  ALGORITHM  DESCRIPTION.  A  top-level  descripdon  of  our  adaptive  pro¬ 
cedure  is  presented  in  Figure  1  in  a  pseudo-PASCAL  language.  This  procedure  is  called 
adaptive JPDE_solver  and  it  integrates  a  system  of  partial  differential  equations  from  time 
tinit  to  tfinal  and  attempts  to  keep  the  local  error  indicators  below  a  tolerance  of  tol. 
The  base  level  time  step  A t  is  initially  specified,  but  may  be  changed,  as  needed,  during 
the  integration. 


procedure  adaptive_PDE_soIver(rimf,  At,  tfinal,  tol :  real;  M ,  N:  integer); 
begin 

Generate  an  initial  base  mesh; 
t  :=  tinit; 

while  t  <  tfinal  do 
begin 

Move  the  base  mesh  for  the  time  step  t  to  t  +  At; 
local_refine(0,  t.  At,  tol); 
t  :=  t  +  At; 

Select  an  appropriate  At; 

if  base  mesh  is  too  distorted  then  regenerate  a  base  mesh 
end 

end  {  adaptive_PDE_solver  }; 


Figure  1.  Pseudo-PASCAL  description  of  an  adaptive  procedure  to  solve  the 
partial  differential  system  (1)  from  t  =  tinit  to  tfinal  to  within  a  tolerance  of 
tol. 


The  rectangular  domain  ft  is  initially  discretized  into  a  coarse  moving  spatial  grid  of 
M  xN  quadrilateral  cells.  An  initial  base  mesh  is  generated  from  this  mesh  by  increasing 
the  values  of  M  and  N,  as  necessary,  and  moving  the  mesh  so  that  it  is  concentrated  in 
regions  where  error  indicators  are  large  (cf.  Section  II.  3).  The  base  mesh  is  moved  for 
each  base  time  step  At  (cf.  Amey  and  Flaherty  [5]  and  Section  II.  1)  and  the  partial 
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differential  system  (1)  is  solved  on  this  mesh  for  a  base  time  step.  This  is  followed  by  a 
recursive  local  mesh  refinement  in  regions  where  error  indicators  are  larger  than  tol.  The 
local  refinement  procedure  local_refine  was  described  in  Amey  and  Flaherty  [6]  and  its 
major  features  are  summarized  in  Section  D.2.  The  integration  for  each  base-mesh  time 
step  is  concluded  by  the  selection  of  a  new  value  of  A t  for  the  subsequent  time  step  and 
the  generation  of  new  base  mesh  (cf.  Section  II.3),  if  necessary. 

The  mesh  moving,  local  refinement,  and  mesh  regeneration  algorithms  are  uncoupled 
from  each  other  as  well  as  from  the  procedures  used  to  solve  the  partial  differential  system 
and  calculate  local  error  indicators.  This  reduces  computational  costs  and  provides  a  great 
deal  of  flexibility.  Thus,  individual  modules  can  easily  be  replaced,  omitted,  or  combined 
with  other  software. 

II.  1.  Mesh  Moving  Algorithm.  Mesh  moving  strategies  should  produce  a  smooth 
mesh  where  the  sizes  of  neighboring  computational  cells  vary  slowly  and  cell  angles  differ 
only  by  modest  amounts  from  right  angles.  It  is,  of  course,  essential  for  the  nodes  of  the 
mesh  to  remain  within  and  for  cells  not  to  overlap.  Meshes  that  violate  these  condi¬ 
tions  can  produce  large  discretization  errors  that  overwhelm  the  positive  effects  of  mesh 
moving.  Our  mesh  moving  procedure  is  based  on  an  intuitive  approach  rather  than  more 
analytic  error  equidistribution  (cf.,  e.g.,  Coyle  et  al.  [19]  or  Dwyer  [23])  and  variational 
approaches  (cf.  Brackbill  and  Saltzman  [17]).  The  essential  idea  is  to  move  the  mesh  so 
as  to  follow  isolated  nonuniformities,  such  as  wave  fronts,  shock  layers,  and  reaction 
zones.  This  generally  reduces  dispersive  errors  and  allows  the  use  of  larger  time  steps 
while  maintaining  accuracy  and  stability. 

At  each  base  time,  we  scan  the  M  xN  base  mesh  of  quadrilateral  cells  and  locate 
"significant  error  nodes"  as  those  having  error  indicators  greater  than  twice  the  mean  nodal 
error  indicator  and  also  greater  than  ten  percent  of  tol.  This  empirical  strategy  avoids 
having  the  mesh  respond  to  fluctuations  when  error  indicators  are  too  small,  but  is  sensi¬ 
tive  enough  to  avoid  missing  dynamic  phenomena  associated  with  large  error  indicators. 
If  there  are  no  significant  error  nodes,  computation  proceeds  on  a  stationary  mesh.  The 
nearest  neighbor  clustering  algorithm  of  Berger  and  Oliger  [15]  is  then  used  to  gather  the 
significant  error  nodes  into  clusters.  In  this  iterative  algorithm,  a  cluster  is  first  defined  to 
consist  of  one  arbitrary  significant  error  node.  Other  significant  error  nodes  are  added  to 
the  cluster  if  they  are  within  a  specified  minimum  intercluster  distance  from  the  nearest 
node  in  the  cluster.  We  take  the  minimum  intercluster  distance  to  be  the  length  of  a  cell 
diagonal.  New  clusters  are  established  for  nodes  that  do  not  belong  to  any  existing  clus¬ 
ter.  Clusters  are  united  when  a  node  is  determined  to  belong  to  more  than  one  of  them. 
Upon  completion  of  the  algorithm,  (i)  nodes  in  different  clusters  will  be  separated  by  at 
least  the  minimum  intercluster  distance,  and  (ii)  no  node  in  a  cluster  with  more  than  one 
node  will  be  further  than  the  minimum  intercluster  distance  from  its  nearest  neighbor  in 
the  cluster. 

Following  Berger  and  Oliger  [15],  we  generate  near  minimum  area  rectangles  that 
contain  each  cluster.  The  principal  axes  of  each  rectangle  are  the  major  and  minor  axes  of 
an  enclosed  ellipse  having  the  same  first  and  second  moments  as  the  nodes  in  the  cluster. 
Thus,  if  (Jt,  ,y,  )  are  the  coordinates  of  a  node  and  (xm  ,vm )  are  the  mean  coordinates  of  all 
nodes  in  the  cluster,  then  the  axes  of  the  rectangle  are  in  the  directions  of  the  eigenvectors 
of  the  symmetric  (2  x  2)  matrix 

E  (x,2  ~  **) 

E  (x,yi  -  xmym) 


X  (x^i  xmym ) 

I  (y2  -  y£) 


(2) 
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where  the  summations  range  over  all  nodes  in  the  cluster. 

For  many  problems,  there  may  be  too  small  a  percentage  of  significant  error  nodes 
within  a  cluster.  In  order  to  reduce  this  inefficiency  and  provide  some  alignment  with, 
e.g.,  curved  wave  fronts,  the  rectangles  are  checked  for  efficiency  by  determining  the  per¬ 
centage  of  significant  error  nodes  in  each  rectangle.  If  a  fifty-percent  efficiency  is  not 
achieved,  the  rectangle  is  iteratively  bisected  in  the  direction  of  its  major  axis  until  all 
clusters  have  at  least  a  fifty-percent  efficiency. 

We  determine  node  movement  from  the  velocity  of  propagation,  the  orientation,  and 
the  size  of  error  clusters.  We  assume  that  nodes  in  the  same  cluster  have  related  solution 
characteristics,  so  that  we  can  determine  individual  node  movement  from  the  propagation 
of  the  center  of  the  error  cluster.  Each  cluster  moves  according  to  the  differential  equa¬ 
tion 

=  0,  (3) 

where  rm(r)  =  [xm(r)jm(r)]r  is  the  position  of  the  center  of  an  error  cluster  and 
O  :=  d{  )/dt.  The  choice  of  the  parameter  X  can  be  critical  in  certain  situations.  If  X  is 
selected  too  large,  the  system  (3)  will  be  stiff  and  computationally  expensive.  On  the 
other  hand,  if  X  is  too  small,  the  mesh  can  oscillate  from  time  step-to-time  step.  Coyle  et 
al.  [19]  and  Adjerid  and  Flaherty  [2]  suggested  some  adaptive  procedures  for  choosing  X; 
however,  we  found  no  appreciable  differences  in  results  or  computation  times  when  X 
varied  significantly.  The  examples  of  Section  III  were  calculated  with  X  =  1. 

We  solve  (3)  for  each  base  time  step  and  each  cluster  using  an  explicit  numerical 
method.  The  center  of  an  error  cluster  is  moved  a  distance  Arm  =  rm (r+Ar)  -  rOT(r)  at 
the  base  time  /.  Let  A rmi  and  A rmi  denote  the  projections  of  Arm  onto  the  major  and 
minor  axes  of  the  rectangular  cluster.  We  use  the  one-dimensional  piecewise  linear  func¬ 
tion 


di, inside 


* 

Arnii(3/2+xi/wi), 

A'm,. 

A r^Q/2-Xi/Wi), 

0, 


if  -3w,-/2  £  x,-  <  -w,  /2 
if  -W;/2  <  x,  <  w,/2 
if  w(-  /2  £  x,  <  3w(  /2  * 

otherwise 


(4) 


to  move  the  nodes  of  the  mesh  along  the  two  principal  axial  directions  of  the  error  clus¬ 
ters.  The  cluster  referred  to  in  (4)  has  dimensions  xw2  and  (xj,x2)  arc  local  Cartesian 

coordinates  of  a  node  in  the  principal  directions  of  the  cluster  relative  to  its  center.  For 

i  =  1,  nodes  in  the  range  of  the  cluster  (-3wi/2  £  X\  S  3^/2,  -w2/2  <;  x2  £  w2/2)  are 

moved  a  distance  d  j tinside .  This  situation  is  shown  in  Figure  2. 

In  order  to  maintain  smooth  mesh  motion  throughout  the  domain,  nodes  outside  the 
range  of  a  cluster  move  in  a  distance 


di, outside  =  winded  ~  (2z/D)],  i  —  1,  2,  (5) 

where  z  is  the  shortest  distance  to  the  range  of  the  cluster  (cf.  Figure  2)  and  D  is  the 
diagonal  of  ft.  For  each  cluster,  the  mesh  is  moved  in  the  direction  of  the  major  axis 
(i  =1)  using  (4)  and  (5).  This  is  followed  by  a  similar  procedure  in  the  direction  of  the 
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Figure  2.  A  rectangular  vvjxw^  error  cluster.  Nodes  within  the  range  of  the 
cluster,  3wj  xw2,  are  moved  a  distance  dXinslde  in  the  xx  principal  direction  ac¬ 
cording  to  eq.  (4).  Nodes  outside  the  range  of  the  cluster  are  moved  a  distance 
d\ .outside  in  the  xx  direction  according  to  eq.  (5).  The  distance  z  is  the  shortest 
distance  to  the  range  of  the  cluster. 


minor  axis  (i  =  2).  The  distances  diinsidf  and  di  outsidt  are  reduced  near  d£2  in  order  to 
prevent  nodes  from  leaving  £2.  In  particular,  we  recalculate  dt  j  as  d,  j[mm(l,  b/c)], 
i  =  1,  2,  j  =  inside ,  outside ,  where  b  is  the  distance  of  the  node  to  the  boundary  and  c  is 
twice  the  length  of  a  cell  diagonal  on  a  uniform  mesh  having  the  same  number  of  cells  as 
the  moving  mesh.  Nodes  on  domain  boundaries,  except  comer  nodes,  which  are  not 
moved,  are  restrained  to  move  along  the  boundary.  Finally,  the  mesh  moving  algorithm  is 
not  restricted  to  the  functions  given  by  (4)  and  (5),  and  several  other  choices  are  possible. 

II.2.  Local  Refinement  Algorithm.  As  shown  in  Figure  1,  the  local  refinement  pro¬ 
cedure  is  invoked  after  the  base  mesh  has  been  moved  for  a  base  time  step.  Our 
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refinement  strategy  consists  of  first  calculating  a  preliminary  solution  on  the  base  mesh  for 
a  base  time  step.  An  error  indicator  is  used  to  locate  regions  where  greater  resolution  is 
needed.  Finer  grids  are  adaptively  created  in  these  high-error  regions  by  locally  bisecting 
the  time  step  and  the  sides  of  the  quadrilateral  cells  of  the  base  grid  and  the  solution  and 
error  indicators  are  computed  on  the  finer  grids.  The  refinement  scheme  is  recursive;  thus, 
fine  subgrids  may  be  refined  by  adaptively  creating  even  finer  subgrids.  This  relationship 
leads  naturally  to  a  tree  data  structure.  Information  regarding  the  geometry,  solution,  and 
error  indicators  of  the  base  grid  is  stored  as  the  root  node  or  level  0  of  the  tree.  Subgrids 
of  the  base  grid  are  offsprings  of  the  root  node  and  are  stored  as  level  1  of  the  tree.  The 
structure  continues,  with  a  grid  at  level  /  having  a  parent  coarser  grid  at  level  /  -  1  and 
any  finer  offspring  grids  at  level  l  +  1.  Grids  at  level  /  of  the  tree  are  given  an  arbitrary 
ordering  and  we  denote  them  as  Gtj,  j  =  1,  2, ...,  N[,  where  Nt  is  the  number  of  grids  at 
level  /.  Our  refinement  procedures  permit  grids  at  the  same  level  of  a  two-dimensional 
problem  to  intersect  and  overlap;  however  offspring  grids  must  be  properly  nested  within 
the  boundaries  of  their  parent  grid.  A  one-dimensional  grid  with  its  appropriate  tree  struc¬ 
ture  for  a  base  time  step  is  shown  in  Figure  3. 

A  top-level  pseudo-PASCAL  description  of  a  recursive  local  refinement  algorithm 
that  solves  systems  of  the  form  (1)  on  the  tree  of  grids  described  above  is  presented  in 
Figure  4.  The  procedure  local_refine  integrates  partial  differential  equations  on  the  grids 
G[j ,  j  =  1,  2,  ....  Nh  at  level  /  of  the  tree  from  time  tinit  to  tinit  +  A t  and  attempts  to 
satisfy  a  prescribed  local  error  tolerance  tol.  For  each  grid  at  level  /,  a  solution  and  error 
indicators  are  calculated  at  time  tinit  +  At.  Additional  finer  grids  are  introduced  in 
regions  where  the  error  indicators  exceed  the  prescribed  tolerance  tol  and  the  differential 
system  is  solved  again  on  the  finer  grids  using  two  time  steps  of  duration  At/2  and  a  toler¬ 
ance  of  tol/2.  Observe  that  the  solution,  error  indicators,  and  refined  subgrids  are  calcu¬ 
lated  for  all  grids  at  level  /  before  calculating  any  solutions  at  level  /  +  1.  Implicit  in 
local_refine  are  the  assumptions  that  a  solution  can  be  computed  on  any  grid  and  that 
refinement  terminates.  If  either  of  these  assumptions  are  violated,  the  procedure  ter¬ 
minates  in  failure. 

Our  technique  for  introducing  finer  subgrids  consists  of  four  steps:  (i)  an  initial  scan 
of  each  level  /  grid  to  locate  "untolerable-error"  nodes  as  those  where  the  error  indicator 
exceeds  the  prescribed  tolerance  tol ,  (ii)  clustering  any  untolerable  nodes  into  rectangular 
regions,  (iii)  buffering  the  clustered  regions  in  order  to  reduce  problems  associated  with 
prescribing  initial  and  boundary  conditions  at  coarse/fine  grid  interfaces,  and  (iv)  cellularly 
refining  the  level  /  meshes  and  time  step  within  the  buffered  clusters.  Of  course,  if  there 
are  no  untolerable-error  nodes,  the  solution  is  acceptable  and  further  refinement  is 
unnecessary. 

The  same  clustering  algorithm  of  Berger  and  Oliger  [15]  that  was  used  to  move  the 
base  mesh  is  also  used  to  group  untolerable-error  nodes  for  refinement.  Each  rectangular 
error  cluster  is  enlarged  by  increasing  its  major  and  minor  axes  by  twice  the  size  of  the 
average  cell  edge  within  the  cluster.  The  region  between' the  enlarged  and  original  error 
clusters  provides  a  buffer  so  that  artificial  internal  boundary  conditions  (that  are  discussed 
below)  will  be  prescribed  at  low-error  nodes  as  far  as  possible  and  fine-grid  errors  will  not 
propagate  through  the  buffer  in  a  time  step. 

Refined  subgrids  are  created  by  bisecting  the  time  step  and  edges  of  each  cell  of  the 
parent  mesh  that  intersects  the  buffered  rectangular  error  clusters.  Coarse  mesh  motion  is 
maintained  on  the  refined  grids  so  that  after  two  time  steps  of  size  At  12,  cells  of  the 
refined  grids  will  be  properly  nested  within  those  of  their  parent  grid.  Additional  details 
of  the  refinement  algorithm  and  data  structures  are  presented  in  Amey  and  Flaherty  [6]. 

Artificial  initial  and  boundary  data  must  be  determined  from  solutions  on  other  grids 
in  order  to  calculate  the  solution  and  error  indicators  on  refined  subgrids.  Furthermore, 
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Figure  3.  Coarse  and  refined  grids  (top)  and  their  tree  representation  (bottom) 
for  a  one-dimensional  example. 


solutions  on  finer  grids  are  used  to  replace  those  on  coarser  grids  at  common  nodal  loca¬ 
tions. 

Initial  data  for  a  subgrid  is  calculated  directly  from  the  initial  function  u0(r  j)  at 
t  =  0.  For  t  >  0,  initial  data  is  obtained  by  interpolation  using  the  solution  at  the  same 
time  on  the  finest  available  mesh.  In  order  to  provide  data  for  this  interpolation,  we  save 
all  solution  values  on  previous  subgrids  until  they  are  no  longer  needed  due  to  advance¬ 
ment  in  time  of  an  acceptable  solution.  Bilinear  functions  using  the  solution  values  at  the 
four  vertices  of  the  finest  existing  cell  are  used  to  obtain  the  solution  at  the  nodes  of  cells 
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procedure  local_refine(  /:  integer;  tinit.  At,  tol:  real ); 
begin 

for  j  :=  1  to  N[l]  do 
begin 

Integrate  the  partial  differential  system  from  tinit  to  tinit  +  At 
on  grid  G[IJ]\ 

Calculate  error  indicators  at  tinit  +  At  at  all  nodes  of  grid 
G[IJ  ]; 

if  any  error  indicators  >  tol  then  introduce  level  /  +  1  subgrids 
of  G  [IJ] 
end  {  for  ); 

if  any  error  indicators  >  tol  then 
begin 

local_refine(/  +  1,  tinit,  At/2,  toll 2); 
local_refine(/  +  1,  tinit  +  At  12,  At/2,  toll 2) 

end 

end  {  local_refine  }; 


Figure  4.  Pseudo-PASCAL  description  of  a  recursive  local  refinement  pro¬ 
cedure  to  find  a  solution  of  the  partial  differential  system  (1)  on  all  grids  at  level 
/  of  the  tree. 


of  the  refined  mesh.  Further  analysis  is  needed  regarding  the  effects  on  accuracy  and  sta¬ 
bility  and  the  proper  order  of  this  interpolation.  Bieterman,  Flaherty,  and  Moore  [16]  give 
an  example  where  the  fine-scale  structure  of  a  solution  was  lost  by  interpolation  from  too 
coarse  a  mesh 

In  a  similar  manner,  boundary  data  for  refined  meshes  are  calculated  directly  from 
the  prescribed  boundary  conditions  on  portions  of  subgrids  that  intersect  dQ.  Dirichlet 
boundary  data  is  prescribed  on  the  edges  of  subgrids  that  are  in  the  interior  of  Q  by  inter¬ 
polating  the  solution  from  coarser  meshes.  Bilinear  functions  using  the  solution  values  at 
the  four  vertices  of  the  adjacent  face  of  the  finest  existing  space-time  cell  are  used  to 
obtain  solution  values  for  the  nodes  of  refined  cells. 

Acceptable  fine-mesh  solutions  are  used  to  replace  solutions  at  the  nodes  of  coarser 
grids  that  lie  within  the  untolerable-error  portions  of  clusters.  Solutions  at  low-error  nodes 
in  the  buffer  zones  of  clusters  are  not  replaced  in  order  to  avoid  possible  contamination  of 
accurate  solutions.  When  fine  grids  overlap  each  other  in  an  untolerable-error  region,  the 
average  value  of  the  solutions  at  common  fine-grid  nodes  is  used  to  replace  the  appropri¬ 
ate  coarse  grid  solution.  Boundary  effects  do  not  propagate  through  a  sufficiently  large 
buffer  and,  thus,  have  no  effect  on  the  solution  within  the  untolerable-error  region  of  a 
cluster  when  an  explicit  numerical  scheme  is  used  for  the  integration.  Greater  care  is 
needed  when  implicit  integration  methods  are  used,  since  artificial  boundary  conditions 
can  affect  the  accuracy,  convergence,  and  stability  of  the  solution  at  all  nodes  in  the  clus¬ 
ter  regardless  of  the  size  of  the  buffer. 
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Stability  and  conservation  of,  e.g.,  fluxes  at  interfaces  between  coarse  and  fine 
meshes  must  be  investigated  further,  particularly  in  two  dimensions.  For  one-dimensional 
problems,  Berger  and  Oliger  [15]  showed  that  linear  interpolation  of  solutions  from  a 
coarse  to  a  fine  mesh  produced  no  instabilities  in  the  the  Lax-Wendroff  scheme.  Berger 
[14]  also  discussed  conservation  at  mesh  interfaces  and  proposed  explicit  enforcement  of 
conserved  quantities  at  coarse/fine  mesh  boundaries.  Rai  [29]  presented  some  finite 
difference  schemes  that  maintained  conservation  at  grid  interfaces  for  two-dimensional 
compressible  flow  problems. 

II.3.  Initial  Mesh  Construction  and  Regeneration.  The  efficiency  of  our  adaptive 
mesh  moving  and  refinement  strategies  are  dependent  on  our  ability  to  generate  a  suitable 
initial  mesh  and  to  regenerate  a  new  base  mesh  should  it  become  severely  distorted  at  later 
times.  The  proper  base  mesh  can  reduce  the  need  for  refinement  and,  thus,  increase 
efficiency. 

The  two  essential  elements  of  a  mesh  generation  or  regeneration  procedure  are  the 
determination  of  the  number  of  nodes  and  their  optimal  location.  A  base  mesh  having  too 
few  nodes  will  result  in  excessive  refinement  while  one  having  too  many  nodes  will 
reduce  efficiency.  Many  mesh  generation  procedures  have  been  developed  (cf.,  e.g., 
Thompson  [31]  or  Brackbill  and  Saltzman  [17]);  however,  the  best  one  to  use  in  conjunc¬ 
tion  with  an  adaptive  procedure  is  still  far  from  being  established.  Our  current  approach 
to  mesh  generation  is  to  use  the  error  indicators  computed  by  a  trial  solution  to  determine 
an  initial  mesh  that  approximately  equidistributes  the  error  indicators. 

To  begin,  we  create  a  uniform  M  xN  rectangular  mesh  using  prescribed  values  of  M 
and  N  that  reflect  the  coarsest  mesh  that  should  be  used  to  calculate  a  solution.  We  solve 
the  system  (1)  for  a  base  time  step  At  on  the  uniform  stationary  base  mesh  and  compute 
the  solution  and  error  indicators.  Local  mesh  refinement  is  performed  as  described  in  Sec¬ 
tion  H.2  until  the  prescribed  tolerance  is  attained.  We  use  this  solution  to  determine  the 
number  of  nodes  K  in  a  new  base  mesh  as 


K  =  MN  +  £  (3/4)lKh  (6a) 

/=i 

where  Kt  is  the  number  of  nodes  introduced  at  level  /  and  n  is  the  total_number  of  levels 
in  the  tree.  Having  computed  K ,  we  calculate  the  dimensions  of  a  new  M  xN  mesh  as 

M  =  -l KM  IN  ,  N  =  <KN!M.  (6b) 

The  bars  have  been  omitted  on  M  and  N  in  the  algorithms  displayed  in  Figures  1  and  4 

and  in  all  further  discussions. 

Node  placement  for  the  new  base  mesh  is  accomplished  by  locating  all  nodes  of  the 
original  base  mesh  having  error  indicators  that  are  greater  than  twice  the  mean  error  indi¬ 
cator.  These  nodes  are  then  grouped  into  rectangular  clusters  using  the  clustering  algo¬ 
rithm  of  Section  11.1.  A  uniform  base  mesh  is  generated  when  there  are  no  nodes  having 
error  indicators  that  are  greater  than  twice  the  mean  error  indicator. 

Nodes  are  moved  towards  the  center  of  the  nearest  error  cluster  unless  they  are 
within  a  two-cell  diagonal  range  of  two  or  more  error  clusters.  In  the  former  case,  a  node 

is  moved  four-tenths  of  its  distance  to  the  center  of  the  nearest  cluster  unless  this  distance 

is  greater  than  12.5  times  the  average  cell  diagonal,  in  which  case  it  is  moved  five  times 
the  average  cel!  diagonal.  Nodes  that  are  within  a  two-cell  diagonal  range  of  two  or  more 
cluster:  are  moved  by  four-tenths  of  a  weighted  average  of  the  distances  to  centers  of  the 
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involved  clusters.  Nodes  on  3Q  remain  on  dft.  Nodes  near  the  boundary  move  a  reduced 
distance  in  order  to  prevent  the  formation  of  large  elements.  When  an  error  cluster  inter¬ 
sects  opposite  boundaries  of  fl,  nodes  are  not  moved  in  the  direction  of  the  major  axis  of 
the  cluster.  This  construction  generates  a  base  mesh  that  depends  on  the  solution  of  the 
partial  differential  system  as  well  as  its  initial  condition. 

The  base  mesh  can  become  severely  distorted  for  some  problems  (cf.  Amey  and 
Flaherty  [5])  and  we  would  like  a  capability  for  generating  a  new  base  mesh  whenever 
this  happens.  Since  the  new  mesh  is  created  at  a  specific  time,  rather  than  by  mesh 
motion,  we  refer  to  this  process  as  static  mesh  regeneration.  Our  static  mesh  regeneration 
procedure  consists  of  three  steps:  (i)  determining  that  there  is  a  need  for  a  new  base  mesh, 
(ii)  creating  the  new  base  mesh,  and  (iii)  interpolating  the  solution  from  the  old  to  the  new 
base  mesh. 

A  mesh  is  regenerated  when  any  interior  angle  of  a  cell  is  less  than  50  or  greater 
than  130  degrees,  the  aspect  ratio  of  any  cell  is  greater  than  15,  or  the  mesh  ratio  of  adja¬ 
cent  cells  exceeds  5  or  is  less  than  1/5.  In  the  present  context,  the  aspect  ratio  is  defined 
as  the  average  length  divided  by  the  average  width  of  a  cell  and  the  mesh  ratios  are 
defined  as  the  ratio  of  the  lengths  and  widths  of  adjacent  cell  sides. 

A  new  base  mesh,  having  the  same  number  of  nodes  as  the  old  one,  is  generated 
using  the  procedure  described  above  for  creating  an  initial  base  mesh.  The  error  clusters 
for  the  existing  mesh  are  used  to  generate  the  new  base  mesh,  so  that  new  clusters  do  not 
have  to  be  computed.  This  process  appears  to  reduce  angle  deviations  from  ninety 
degrees,  control  aspect  ratios,  and  mollify  adjacent  mesh  ratios. 

Once  a  new  base  mesh  has  been  constructed,  the  solution  on  the  old  one  is  interpo¬ 
lated  to  the  new  one  by  using  bilinear  interpolation  with  respect  to  the  cells  of  the  old 
base  mesh.  The  order  and  nature  of  the  interpolation  needs  further  investigation  and  we 
are  studying  methods  that  conserve,  e.g.,  fluxes  (cf.  Berger  [14]  or  Rai  [29]). 


III.  COMPUTATIONAL  EXAMPLES.  In  order  to  demonstrate  the  capabilities  of 
the  adaptive  procedure  described  in  Section  II,  we  applied  it  to  three  hyperbolic  systems. 
We  used  a  two-step  MacCormack  finite  difference  method  (cf.  Amey  and  Flaherty  [5], 
Hindman  [26],  or  MacCormack  [27])  to  integrate  the  partial  differential  equations  and 
Richardson’s  extrapolation  (cf.  Amey  [4]  or  Berger  and  Oliger  [15])  to  indicate  local 
errors.  Base  mesh  geometry  was  prescribed  as  indicated  in  each  example.  If  the  base 
mesh  time  step  failed  to  satisfy  the  Courant,  Friedrichs,  Lewy  theorem,  it  was  automati¬ 
cally  reduced  to  the  maximum  allowed  by  the  Courant  condition  (cf.  Amey  [4]  and  Amey 
and  Flaherty  [6]).  This  procedure  should  also  satisfy  the  Courant  condition  on  all  subgrids 
when  the  characteristic  speeds  vary  slowly. 

Numerical  results  obtained  on  uniform  stationary  grids  are  compared  with  those 
obtained  by  adaptive  strategies  that  use  (i)  mesh  moving  only,  (ii)  local  refinement  only, 
and  (iii)  the  combination  of  mesh  moving  and  refinement  discussed  in  Section  II.  The 
examples  are  designed  to  determine  the  relative  cost,  accuracy,  and  efficiency  of  our  adap¬ 
tive  algorithm  and  each  of  its  components.  Accuracy  is  appraised  by  computing  the 
difference  e  between  the  exact  and  numerical  solutions  of  a  problem  in  either  the  max¬ 
imum  or  L  j  norms,  i.e.,  by  computing  either 


l|e(v,r  )IL  :=  max  max  I e} (*,- & ,r)  I ,  (7a) 

lSiSAT  1  ijim 

or 

He(v,Olli  =  ffp  2)  I  ej  I  dxdy , 
a  j=i 
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(7b) 


respectively.  Here,  K  is  the  number  of  nodes  in  the  mesh  at  time  t  and  P  is  a  piecewise 
constant  interpolation  operator  with  respect  to  the  cells  of  the  base  mesh  that,  on  each  cell, 
has  the  average  value  of  the  errors  at  the  vertices  of  the  cell.  We  use  either  the  total  CPU 
time  or  the  maximum  number  of  nodes  used  in  a  base  time  step  as  measures  of  the  com¬ 
putational  complexity  of  a  procedure.  All  calculations  were  performed  in  double  precision 
arithmetic  on  an  IBM  3081/D  computer  at  the  Rensselaer  Polytechnic  Institute. 

Solutions  are  displayed  by  drawing  either  level  lines  or  wire-frame  perspective  rendi¬ 
tions.  Meshes  are  displayed  by  showing  the  complete  two-dimensional  spatial  discretiza¬ 
tion  at  specified  times  with  finer  subgrids  overlaying  coarser  ones.  This  portrayal  does  not 
show  the  reduced  time  steps  that  are  used  for  the  subgrid  calculations.  The  broken-line 
rectangles  in  the  figures  indicates  the  error  cluster(s)  that  are  used  to  move  the  base  mesh. 

Example  1.  Consider  the  linear  initial-boundary  value  problem  that  was  proposed  as 
a  test  problem  by  McRae  et  al.  [28]: 


and 


ut  —  yux  +  xUy  —  0,  t  >  0,  (x  ,y )  €  Q, 


u(x,y,  0) 


,  0,  if  (x-1/2)2  +  1.5y2  ^  1/16 

_1  -  16((x— 1/2)2  +  1.5y2),  otherwise, 

(x,y)  e 


u (x ,y ,t )  =0,  t  >  0,  (*,>’)  e  an, 


(8a) 


(8b) 

(8c) 


where  Q  :=  { ( x,y )  I  -1.2  <x,y  <  1.2  }. 

The  exact  solution  of  (8)  is  an  elliptical  cone  that  rotates  about  the  origin  in  the 
counterclockwise  direction  with  period  2tt.  It  can  be  written  in  the  form 


where 


u(x,y,t) 


0,  if  C  <  0 
C,  if  C  2  0, 

b 


C  =  1  -  16[(xcosr  +  ysinf  -  1/2)2  +  1.5(ycost  -xsinr)2]. 


(9a) 

(9b) 


Five  adaptive  and  uniform  mesh  solutions  of  (8)  were  calculated  for  0  <  t  <>  3.2  and 
our  findings  are  summarized  in  Table  1.  Solutions  3  and  4,  with  refinement,  were  calcu¬ 
lated  using  an  error  tolerance  of  0.0002  and  a  maximum  of  two  levels  of  refinement.  The 
tolerance  and  maximum  level  of  refinement  were  selected  so  that  the  high-error  region 
under  the  cone  would  maintain  approximately  the  same  mesh  spacing  as  the  uniform  mesh 
used  to  obtain  Solution  S.  The  grids  that  were  used  'o  obtain  Solution  4  are  shown  in 
Figure  5  at  t  =  0.56,  1.68,  2.24,  and  3.2.  A  new  base  mesh  was  introduced  at  t  =  2.82. 
The  meshes  that  were  used  to  obtain  Solutions  2,  3,  and  4  at  t  =  3.2  are  shown  in  Figure 
6.  Finally,  surface  and  contour  plots  of  Solutions  1,  2,  and  3  and  of  Solutions  4  and  5  at 
t  =  3.2  are  shown  in  Figures  7  and  8,  respectively. 

Solution  1  bears  no  resemblance  to  the  exact  solution  and  demonstrates  the  devastat¬ 
ing  effects  of  large  dissipative  and  dispersive  errors.  Solution  2,  with  mesh  moving  only, 
provides  a  dramatic  improvement  in  the  results  for  approximately  one-half  the  cost  of 
using  both  tnesh  motion  and  refinement.  Solution  5  took  more  than  three-times  longer  to 
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Table  1.  Errors  at  t  =  3.2  and  computational  costs  for  five  solutions  of  Example 

1. 


calculate  than  Solution  4  for  approximately  the  same  accuracy;  thus,  demonstrating  the 
efficiency  of  the  refinement  process.  The  subgrids  for  the  refined  Solutions  3  and  4  are 

concentrated  in  the  region  of  the  cone  and  are  aligned  with  its  principal  axes  as  it  rotates. 

Dissipative  and  dispersive  errors  cause  a  "wake"  of  spurious  oscillatory  information  to  fol¬ 
low  the  moving  cone  (cf.  Figures  7  and  8).  Some  mesh  refinement  is  performed  in  the 
wake  region  and  this  greatly  reduces  the  magnitude  of  the  oscillations. 

Example  2.  Consider  the  uncoupled  linear  initial-boundary  value  problem 

ul,  +M1,  =0.  u2'-u2m=0,  t>  0,  (x,y)eQ,  (10a) 

f  1  -  16((;c-l/2)2  +  1.5y2),  if  (x-1/2)2  +  1.5y2  <  1/16 
«iC*0’.0)  =  '[0j  otherwise, 

(x,y)  e  (10b) 

J 1  -  16((jc+i/2)2  +  1.5y2),  if  (x+1/2)2  +  1.5y2  £  1/16 
«2<*0'.0)='[0i  otherwise, 

(x,y)e  Qt^idQ,  (10c) 

ul(x,y,t)  =  u2(x,y,t)  =  0,  t  >  0,  (x,y)  e  OQ,  (lOd) 

and  Q  :=  { (x,y)  I  -1  £  x  £  1,  -0.6  £  y  £  0.6 }. 

The  solution  of  this  problem  consists  of  two  moving  cones  that  collide  and  pass 
through  each  other.  We  selected  it  in  order  to  determine  how  the  various  adaptive  stra¬ 
tegies  could  cope  with  interacting  phenomena. 
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Figure  5.  Grids  created  for  Solution  4  of  Example  1  at  t  -  0.056  (upper  left) 
1.68  (upper  right),  2.24  (lower  left),  and  3.2  (lower  right).  PP 
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Figure  6.  Grids  created  for  Solutions  2,  3,  and  4  of  Example  1  at  t  =  3.2. 
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Figure  8.  Surface  and  contour  plots  for  Solutions  4  (top)  and  5  (bottom)  at 
t  =  3.2  of  Example  1. 


One  uniform  mesh  and  three  adaptive  solutions  of  (10)  were  calculated  for 
0  <  r  £  1.2  and  our  findings  are  summarized  in  Table  2.  The  solutions  involving 
refinement  were  computed  with  a  tolerance  of  0.0038.  All  solutions  were  designed  to 
have  approximately  the  same  accuracy.  The  grids  that  were  used  to  obtain  Solution  4  are 
shown  in  Figure  9  at  t  =0,  0.23,  0.46,  0.92,  and  1.2. 

The  results  of  Table  2  demonstrate  the  efficiency  of  the  mesh  moving  strategy  on  this 
example.  Solution  2  with  mesh  moving  was  slightly  more  accurate  than  Solution  1 
obtained  on  a  uniform  mesh,  and  it  required  less  than  one-half  of  the  computation  time. 
Solution  3  with  refinement  on  a  stationary  mesh  shows  only  a  modest  improvement  over 
Solution  1;  however,  the  combination  of  mesh  moving  and  refinement  computed  in  Solu¬ 
tion  4  again  shows  a  significant  gain  in  efficiency.  We  suspect  that  the  high  accuracy 
achieved  by  mesh  moving  on  this  example  is  due  to  the  reduction  in  dispersive  errors  that 
results  when  the  mesh  follows  the  cones  with  approximately  the  correct  velocity. 

Example  3.  Consider  the  Euler  equations  for  a  perfect  inviscid  compressible  fluid 

u,  +  fjj(u)  +  gy(u)  =  0, 
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Ref. 

No. 

Strategy 

Base 

Mesh 

Ik  111 

IkIL 

CPU  Time 
(sec.) 

1 

Stationary 
uniform  mesh 

64x34 

0.066 

0.26 

710 

2 

Movine  mesh 

44x20 

0.056 

0.18 

340 

3 

Stationary  mesh 
with  refinement 

44x20 

0.055 

0.23 

719 

4 

Moving  mesh 
with  refinement 

44x20 

0.039 

0.16 

609 

Table  2.  Errors  at  t  =  1.2  and  computational  costs  for  four  solutions  of  Example 


where 


P 

P  u 

pu 

P  u 

,  f(u)  = 

p  u2+p 

,  g(u)  = 

puv 

pv 

puv 

pv2+/> 

e 

»  « 

u(e+p) 

»  4 

y(e+p) 

(llb,c,d) 


Here,  u  and  v  are  the  velocity  components  of  the  fluid  in  the  x  and  y  directions,  p  is  the 
fluid  density,  e  is  the  total  energy  of  the  fluid  per  unit  volume,  and  p  is  the  fluid  pressure 
For  an  ideal  gas 

p  =  (y- l)[e  -  p(w2  +  v2)/2],  (lie) 

where  y  is  the  ratio  of  the  specific  heat  at  constant  pressure  to  that  at  constant  volume. 

We  solve  a  problem  where  a  Mach  10  shock  in  air  (y  =  1.4)  moves  down  a  channel 
containing  a  wedge  with  a  half-angle  of  thirty  degrees.  This  problem  was  used  by  Wood¬ 
ward  and  Collela  [32]  to  compare  several  finite  difference  schemes  on  uniform  grids. 
Like  them,  we  orient  a  rectangular  computational  domain,  -0.3  £x  £3.4,  0  £  y  £1,  so 
that  the  top  edge  of  the  wedge  is  on  the  bottom  of  the  domain  in  the  interval  y  -  0, 
1/6  £  x  £  3.4.  Thus,  in  the  computational  domain  it  appears  like  a  Mach  10  shock  is 
impinging  on  a  flat  plate  at  an  angle  of  sixty  degrees.  The  initial  conditions  that  are 
appropriate  for  this  situation  are 

p  =  8.0,  p  =  116.5,  e  =  563.5,  u  =4.125^3,  v  =  -4.125, 

if  y  <  V3(x-l/6),  (12a) 

and 
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Figure  9.  Grids  created  for  Solution  4  of  Example  2  at  t  =  0,  0.23,  0.46,  0.92, 
and  1.2  (top  to  bottom). 


1133 


(12b) 


p  =  1.4,  p  =  1.0,  e  =  2.5,  u  =  0,  v  =  0, 

if  y  Z  V3(jc-1/6). 

Along  the  left  boundary  ( x  =  -0.3)  and  the  bottom  boundary  to  the  left  of  the  wedge 
(y  =  0,  -0.3  £  1/6),  we  prescribe  Dirichlet  boundary  conditions  according  to  (12); 

along  the  top  boundary  (y  =  1),  values  are  prescribed  that  describe  the  exact  motion  of  an 
undisturbed  Mach  10  shock;  along  the  right  boundary  (x  =  3.4),  all  normal  derivatives  are 
set  to  zero;  and  along  the  wedge  (y  =  0,  1/6  £  x  £  3.4)  reflecting  boundary  conditions  are 
used. 


The  solution  of  this  problem  is  a  complete  self-similar  structure  called  a  double-Mach 
reflection  that  was  described  in  Ben-Dor  and  Glass  [12,  13].  Two  reflected  Mach  shocks 
form  with  their  associated  Mach  stems  and  contact  discontinuities.  The  geometry  of  these 
structures  are  very  tine  and  are  primarily  confined  to  a  small  region  that  moves  along  the 
wedge  with  the  incident  shock.  One  of  the  two  contact  discontinuities  is  so  weak  that  it  is 
usually  not  noticed  in  computations. 

The  MacCormack  finite  difference  scheme  needs  artificial  viscosity  to  "capture" 
shocks  without  excessive  oscillations.  We  used  a  model  developed  by  Davis  [20]  which  is 
total  variation  diminishing  in  one  space  dimension. 

Five  solutions  of  this  problem  were  calculated  for  0  <  t  £  1.9  as  indicated  in  Table 
3.  Refinement  was  restricted  to  a  maximum  of  two  levels  and  a  tolerance  of  0.6  in  the 
maximum  norm  was  prescribed.  A  pointwise  error  indicator  based  on  the  assumption  of 
smooth  solutions,  like  the  present  one,  is  not  appropriate  for  problems  having  discontinui¬ 
ties.  Without  restricting  the  maximum  level  of  refinement,  we  could  refine  indefinitely  in 
the  vicinity  of  a  discontinuity. 


Ref. 

No. 

Strategy 

Max.  No. 
Nodes 

1 

Stationary 
uniform  mesh 

63x29 

1827 

2130 

2 

Moving  mesh 

■m 

2220 

3 

Stationary  mesh 
with  refinement 

29x11 

mi 

3254 

D 

Moving  mesh 
with  refinement 

29x11 

3540 

3725 

5 

Stationary 
uniform  mesh 

120x40 

4800 

6861 

Table  3.  Maximum  number  of  nodes  in  any  base  time  step  and  computational 
costs  for  five  solutions  of  Example  3. 


Solutions  2  through  5  were  intended  to  be  of  comparable  accuracy  and  we  shall 
attempt  to  appraise  the  computational  cost  of  each  adaptive  strategy.  The  maximum 
number  of  nodes  that  was  introduced  in  any  base  time  step  and  the  total  CPU  time  are 
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presented  as  measures  of  computational  complexity  in  Table  3.  Contours  of  the  density  at 
t  =  0.19  are  shown  for  all  five  solutions  in  Figure  10  and  the  grids  that  were  generated  for 
Solution  4  at  /  =  0.038,  0.076,  0.114,  0.152,  and  0.19  are  shown  in  Figure  11. 

As  in  the  previous  two  examples,  the  mesh-moving  strategy  of  Solution  2  does  a 
great  deal  to  improve  the  res-  Ur  of  the  static  Solution  1  for  approximately  a  five-percent 
increase  in  computational  cost.  Comparing  the  top  two  contours  of  Figure  10,  we  see  that 
the  resolution  of  the  incident  and  reflected  shocks  is  much  finer  with  solution  2  than  with 
Solution  1.  Additional  detail  of  the  structures  in  the  Mach  stem  region  and  of  the  contact 
discontinuities  are  present  in  Solution  2,  but  not  in  the  nonadaptive  Solution  1.  Finally, 
Solutions  1  and  5  display  more  oscillatory  behavior  behind  the  incident  shock  near  the 
upper  boundary.  This  is  undoubtedly  due  to  our  maintaining  a  discontinuity  where  the 
shock  intersects  the  upper  boundary. 

The  use  of  refinement  on  a  stationary  mesh  again  does  not  give  the  dramatic 
improvement  obtained  by  mesh  moving  (cf.  the  second  and  third  contours  of  Figure  10). 
Initally  the  fine  meshes  were  following  the  incident  and  reflected  shock  structures  and 
better  results  were  obtained;  however,  by  /  =0.19  refinement  is  being  performed  over 
much  of  the  domain  and  two  levels  of  refinement  are  not  sufficient  for  adequate  resolution 
(cf.  Amey  and  Flaherty  [6]).  The  combination  of  mesh  motion  and  refinement  depicted  by 
Solution  4  in  Figure  10  provides  a  marked  improvement  in  resolution.  The  sequence  of 
meshes  shown  in  Figure  11  shows  that  the  coarse  mesh  is  able  to  follow  the  differing 
dynamic  structures  and  that  refinement  is  only  performed  in  the  vicinity  of  discontinuities. 
Initially,  only  one  rectangular  cluster  is  needed  to  follow  the  incident  shock  (cf.  Amey 
and  Flaherty  [5]).  As  time  progresses,  two  clusters  are  created  in  order  to  follow  the 
incident  and  reflected  shocks  (cf.  the  upper  three  meshes  of  Figure  11).  A  third  cluster  is 
created  as  time  increases  further  in  order  to  follow  the  evolving  activity  in  the  region  of 
the  Mach  stem  (cf.  the  lower  two  meshes  of  Figure  11). 

Severe  distortion  of  the  mesh  in  the  reflected  shock  region  caused  a  static  mesh 
regeneration  to  occur  for  Solution  4  at  t  =  0.162.  The  base  meshes  before  and  after  the 
static  regeneration  are  shown  in  Figure  12.  Thus,  Solution  4  demonstrates  all  of  the  capa¬ 
bilities  of  our  adaptive  procedure.  The  results  presented  in  Table  3  and  Figure  10  also 
show  that  Solution  4  provided  greater  resolution  than  the  uniform  mesh  Solution  5  for 
approximately  one-half  of  the  cost.  Solution  4  also  shows  many  of  the  same  characteris¬ 
tics  as  the  solution  computed  by  Woodward  and  Collela  [32]  using  MacCormack’s  method 
on  a  240  x  120  uniform  grid.  We  were  unable  to  compute  a  solution  on  such  a  fine  mesh 
due  to  virtual  memory  limitations  on  our  computer;  however,  we  estimate  that  it  would 
have  used  14,400  nodes  and  40,000  CPU  seconds. 

The  results  presented  for  this  problem  demonstrate  the  power  and  efficiency  of  our 
adaptive  techniques;  however,  we  would  have  preferred  to  allow  more  than  two  levels  of 
refinement  and  a  finer  base  mesh.  These  calculations  would  have  produced  better  resolu¬ 
tion  of  the  discontinuities  and  other  fine-scale  structures  that  further  demonstrate  the  com¬ 
putational  advantages  of  adaptive  methods  relative  to  uniform  mesh  techniques.  As  noted, 
restrictions  of  our  computing  environment  prevented  us  from  doing  this  in  a  reasonable 
manner.  We  hope  to  perform  these  calculations  in  the  future  using  a  larger  computing 
system. 


IV.  DISCUSSION  OF  RESULTS  AND  CONCLUSIONS.  We  have  described  an 
adaptive  procedure  for  solving  systems  of  time-dependent  partial  differential  equations  in 
two-space  dimensions  that  combines  existing  mesh  moving  [5]  and  local  refinement  [6] 
techniques.  The  algorithm  also  contains  procedures  for  initial  mesh  generation  and  static 
mesh  regeneration.  It  can  be  used  with  a  wide  variety  of  finite  difference  or  finite  element 
schemes  and  error  indicators. 
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Figure  10.  Contours  of  the  density  at  t  =  0.19  for  Solutions  1  to  5  (top  to  bot¬ 
tom)  of  Example  3. 


Figure  11.  Grids  created  for  Solution  4  of  Example  3  at  t  -  .038,  0.076,  0.114, 
0.152,  and  0.19  (top  to  bottom). 
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Figure  12.  Base  grids  before  (top)  and  after  (bottom)  the  static  mesh  regenera¬ 
tion  that  was  performed  for  Solution  4  of  Example  3  at  r  =  0.162. 


We  obtained  computational  results  for  hyperbolic  systems  of  conservation  laws  by 
using  our  adaptive  methods  with  a  MacCormack  finite  difference  scheme  and  using 
Richardson’s  extrapolation  to  furnish  local  error  indicators.  Our  computational  results  on 
three  examples  indicate  that  mesh  moving  can  significantly  reduce  errors  for  approxi¬ 
mately  a  ten-percent  increase  in  cost  relative  to  computations  performed  on  stationary  uni¬ 
form  meshes.  The  use  of  local  refinement  without  mesh  moving  provided  increased 
efficiency  relative  to  uniform-mesh  calculations,  although  not  as  dramatic  as  those  found 
using  mesh  moving.  The  combination  of  mesh  moving  and  local  refinement  provided  reli¬ 
able  results  while  costing  significantly  less  than  stationary-mesh  calculations.  Thus,  the 
overhead  associated  with  the  dynamic  data  structures  is  less  than  the  time  to  calculate  a 
comparable  solution  on  a  uniform  mesh. 

The  results  of  Section  III  and  others  (cf.  Amey  and  Flaherty  [5,  6])  indicate  that  our 
mesh  moving  procedures  perform  better  alone  than  with  refinement  This  is  because  the 
projection  of  fine-mesh  solutions  onto  coarser  meshes  reduces  the  errors  at  base  mesh 
nodes  and  mesh  motion  based  on  controlling  small  or  zero  local  discretization  errors  either 
fails  or  results  in  no  movement.  Erratic  mesh  motion  can  also  occur  with  some  techniques 
when  movement  indicators  are  small.  This  topic  is  discussed  in  Coyle  et  al.  [19]  and  a 
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possible  remedy  for  one- dimensional  problems  is  suggested  in  Adjerid  and  Flaherty  [2], 
Further  experimentation  and  analysis  are  being  performed  in  order  to  determine  the  best 
way  to  combine  mesh  moving  and  refinement 

There  are  several  other  ways  to  improve  the  efficiency,  reliability,  and  robustness  of 
our  adaptive  methods.  The  t.csent  Richardson’s  extrapolation  based  error  indicator  is 
expensive  and  we  are  seeking  ways  of  replacing  it  by  techniques  using  p-refinement 
Such  methods  have  been  shown  [1,  2,  3,  8,  10,  16,  22]  to  have  an  excellent  cost  perfor¬ 
mance  ratio  when  used  in  conjunction  with  finite  element  methods.  An  appropriate  error 
indicator  or  estimator  can  be  used  to  control  a  differential  refinement  algorithm,  where 
different  refinement  factors  (i.e.,  other  than  binary)  are  used  in  different  high-error  clus¬ 
ters.  If  the  error  indicator  is  capable  of  providing  separate  estimates  of  the  spatial  and 
temporal  errors,  as  the  present  one  does,  then  different  refinement  factors  can  also  be  used 
in  space  and  time.  We  also  hope  to  demonstrate  the  flexibility  of  our  refinement  pro¬ 
cedure  by  using  it  with  a  finite  difference  or  finite  element  scheme  for  parabolic  problems. 

The  greater  reliability  and  efficiency  of  adaptive  techniques  will  be  most  beneficial  in 
three  dimensions.  These  techniques  must  be  able  to  take  advantage  of  the  latest  advances 
in  vector  and  parallel  computing  hardware.  The  tree  is  a  highly  parallel  structure  and  we 
have  been  developing  solution  procedures  that  exploit  this  in  a  variety  of  parallel  comput¬ 
ing  environments. 
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VORTEX  FISSION  AND  FUSION 


Karl  Gustafson 
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Boulder,  Colorado  USA 

Abstract.  Vortex  fission-fusion  sequences  are  found  in  full  Navier-Stokes 
flow  at  higher  Reynold’s  numbers  and  higher  aspect  ratios.  Some  of  these 
represent  transient  bifurcations  whereas  others  persist,  indicating  Hopf  bifurca¬ 
tions  in  the  final  states.  Such  sr.nuences  appear  to  be  initiated  by  a  sublayer 
viscous-inviscid  bursting  effect  caused  by  a  wall-eddy  development  near  a 
separation  point.  Parity  rules  and  the  relative  proximities  of  provocation  points 
and  obstructions  play  fundamental  roles  in  the  further  evolution  of  the  vortex 
shedding  and  coalescence  dynamics.  As  intensities  drop  off  sharply  in  the  secon¬ 
dary  structures,  numerical  resolution  considerations  become  paramount.  For 
the  latter  a  new  multigrid  localization  procedure  currently  under  development 
has  proven  to  be  remarkably  robust. 


1.  INTRODUCTION 

Recent  studies  ^see  Benjamin  and  Mullin  (l],  Cliff  and  Mullin  [2],  Bolstad  and 
Keller  [3],  and  the  references  therein)  have  been  concerned  with  questions  of 
flow  multiplicity  higher  than  previously  expected  in  the  Taylor  Problem  of  flow 
between  rotating  cylinders.  For  the  most  part  these  studies  consider  steady  cel¬ 
lular  flows  at  Reynolds  numbers  reasonably  near  those  at  which  the  Taylor  vor¬ 
tices  appear.  Quoting 


"(l,  p.  219]  A  prime  contention  of  the  previous  discussions  has  been 
that  although  the  realistic  hydrodynamic  problem  modelling  the 
Taylor  experiment  is  yet  unsolved  in  closed  form,  it  must  have  a 
high  multiplicity  of  isolated  solutions  when  R  lies  well  above  the 
quasi-critical  range  wherein  Taylor  cells  are  first  easily  demonstrable 
by  standard  flow-visualization  techniques. 

"(2,  p.  256]  A  striking  feature  of  anomalous  modes,  particularly  those 
with  a  larger  number  of  cells,  is  the  distortion  of  the  cell  boundary 
adjacent  to  the  anomalous  cell." 

"(3,  p.  16]  A  new  phenomenon  is  ...  the  splitting  of  the  extra  vortices 
into  two  smaller  vortices." 

Re  [l],  while  admitting  that  I  have  only  recently  become  aware  of  these  recent 
new  higher  multiplicities  found  for  the  Taylor  problem,  nontheless  I  would  first 
like  to  advance  here  the  hypothesis  that  in  some  cases  the  end  effects  in  the 
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Taylor  Problem  imply  even  higher  multiplicities  that  just  haven’t  been  found 
yet,  in  some  situations  infinite  multiplicities.  All  of  this  depends  on  the  exact 
experimental  or  numerical  model  employed,  but  when  a  corner  with  no  slip  con¬ 
ditions  prevailing  is  encountered,  or  a  corner  with  slip  conditions  on  one  side 
only,  e.g.,  an  intersurface  separation  interface,  and  when  the  angle  is  not  too 
large,  one  should  expect  an  infinite  set  of  smaller  vortices  descending  into  it. 
An  example  of  a  sequence  of  ten  of  these  that  we  have  found  in  a  corner  will  be 
given  below. 

A  second  thought  I  advance  here  is  that  the  existence  of  higher  multiplicities  in 
real  flow  depends  more  on  certain  "parity  rules"  established  by  the  fluid  during 
its  actual  dynamic  evolution,  than  on  the  bifurcation  parameter  homotopy 
arguments  followed  in  (1,2,3].  The  latter  "homotopy  model"  is  a  valuable  tech¬ 
nique  in  connection  with  the  numerical  continuation  methods  used  in  [1,2,3]  to 
enable  the  tracking  of  the  "full"  bifurcation  diagram  as,  say,  the  Reynolds 
number  Re  or  the  aspect  ratio  A  is  varied.  But  in  the  end  it  would  appear 
to  be  limited  to  the  analyses  of  the  steady  flow  equations  and  can  therefore  gen¬ 
erate  mathematically  valid  but  physically  spurious  solutions.  I  will  illustrate 
below  the  development  of  such  a  "parity  rule"  structure  governing  a  full 
Navier-Stokes  flow.  Moreover,  as  will  be  seen,  the  parity  rules  explain  the  cell 
boundary  distortion  referred  to  in  (2j. 

Finally,  I  will  illustrate  the  mechanisms  of  the  splitting  of  vortices  into  smaller 
vortices.  This  can  occur  (6]  as  a  function  of  the  varying  of  the  key  parameters 
(e.g.,  Re,  A)  of  the  problem  in  a  steady  flow  as  in  (3]  but  more  interestingly  is 
found  to  occur  dynamically  (6,7]  in  unsteady  flow,  with  both  splitting  and 
coalescence  sequences  found. 


2.  END  EFFECTS  AND  CORNER  VORTEX  SEQUENCES 

As  pointed  out  in  (2,  p.  257],  the  anomalous  modes  are  not  surprising,  should  be 
expected,  and  are  due  to  the  end  effects  on  the  Taylor  annulus. 

In  (4,5]  we  concentrated  on  finding  similar  "anomalous  modes"  for  corner  flow  in 
a  driven  unit  cavity,  and  thus  far  have  succeeded  in  finding  twenty  of  them. 
There  are  (mathematically)  an  infinite  number  there,  although  (computation¬ 
ally)  they  will,  depending  on  the  precision  carried,  drop  into  the  noise  level 
because  their  intensities  fall  off  ©(lO-1*),  and  (physically)  experimentally  only 
three  or  four  at  most  have  been  seen.  For  full  details  about  this  interesting 
problem  see  [8,9],  Here  are  the  first  10  corner  modes  reported  in  [4],  measured 
both  by  stream  function  intensity  and  in  terms  of  the  zeros  z,  between 
them  on  the  45*  diagonal  angle  bisector  extending  out  from  the  lower  left 
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corner  of  the  cavity.  There  are  such  vortex  sequences  in  other  cavity  corners 
but  we  will  omit  discussion  of  those  here.  The  sign  changes  on  the  ^  intensi¬ 
ties  are  in  accordance  with  the  parity  rules  I  will  discuss  next. 


Local  Maximum  Stream  Stream  Function  Zero 
Function  In t e tv _ Measured  Along  Diagonal 


1.0006  X 10" 1 
-2.232  X10"1 
6.165  X  10“u 
-1.703  X10"15 
4.71  XIO"20 
—1.30  X  lO"24 
3.59  X10"29 
-9.93  X 10"34 
2.75  X10"38 
-7.59  XIO"'13 


6.97  X 10"2 
4.205  XIO"3 
2.534  XIO"4 
1.536  XIO"5 
9.247  X  10“7 
5.602  X  10“8 
3.370  XIO"9 
2.0 10  XIO"10 
1.236  XIO"11 
7.421  XIO"13 


3.  PARITY  RULES  AND  PROXIMITY  LIMITATIONS 

As  pointed  out  in  [3,  p.  4],  the  demonstration  of  additional  "hidden"  vortices 
remove  all  difficulties  with  "wrong"  odd  numbers  of  vortices  found  in  previous 
experiments. 

Such  "hidden"  very  weak  vortices  are  known  in  the  aerodynamic  literature  as 
"ornamentation"  vortices.  I  preferred  the  term  "intermediating"  vortices  in  [8] 
to  indicate  that  they  are  not  ornamental  in  any  sense  of  the  word  but  are  in 
fact  topological  necessities  to  the  flow.  Aerodynamics  is  characterized  by  open 
regions  and  often  the  smaller  vortices  do  indeed  flow  away,  but  in  a  closed  flow 
such  as  the  Taylor  geometry  of  [1,2,3]  or  in  the  cavity  geometry  of  [6,7]  they  do 
not  disappear  once  they  have  managed  to  enter  the  flow.  Whether  or  not  they 
can  enter  appears  to  depend  not  only  on  their  parity  but  also  on  the  proximity 
of  their  potential  development  region  to  ends,  cornel's,  walls,  and  even  to  inter- 
surface  separation  lines.  Here  are  some  details  of  their  evolution  as  reported  in 
[6,7].  Note  that  the  deformed  cell  contours  occur  very  naturally  in  terms  of  the 
parity  signs  plotting  along  their  boundaries. 
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(c)  t  *  60  seconds 


(d)  t  8  75  seconds 


Figure  1.  Separation  Region  "Lubrication"  Dynamics 

(a)  Fission  into  3,  almost  4,  tertiary  eddies  for  "self 
lubrication" . 

(b)  Right  "comer  lubrication"  begins,  to  continue  the 
energy  cascade. 

(c)  The  last  two  eddies  report  in,  causing  "temporary 
mass  confusion". 

(d)  "Final  Resting  Place",  as  the  basic  final  flow 
topology  is  determined. 


sec*  57  sec  360  sec 

Figure  2.  Wall,  corner,  provocation,  separating,  and  intermediating 
effects  in  cavity  flow  dynamics. 
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359  sec. 


360  sec. 


Figure  3 


Final  vavyness  of  vortex  dynamics  at  Re  ■  10000  in  a 
cavity  of  depth  2. 
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4.  INTERMITTENT  BIFURCATION  AND  HOPF  BIFURCATIONS 


.As  can  be  seen  from  the  above.  Figures  1,  2,  3,  for  full  Navier-Stokes  flow  at 
Re  =  10,000  and  in  a  depth  2  cavity,  there  are  many  intermittent  bifurcations 
(e.g.,  wall  bursting,  the  splitting-coalescence  sequences)  that  appear  during  a 
dynamic  development  of  a  flow.  These  must  be  taken  into  account  in  any 
attempt  at  any  physically  correct  understanding  of  any  final  steady  flow  bifur¬ 
cation  diagram.  Some  of  these  sequences  are  transient,  thus  "ornamental"  in 
one  sense,  yet  their  temporary  existence  is  absolutely  essential  in  the  "mediat¬ 
ing"  of  the  development  of  a  final  flow  pattern. 

The  full  flow  history  (see  [6,7])  of  the  Re  =*  10,000  cavity  dynamics  indicates 
that  the  repeating  pattern  at  the  left  midwall  represents  a  final  periodic  solu¬ 
tion.  Our  results  thus  imply  the  existence  of  a  Hopf  bifurcation,  for  flow  in  a 
depth  2  cavity,  at  some  critical  Reynolds  number  Re  strictly  between  2000  and 
10,000.  One  should  distinguish  this  Hopf  bifurcation  of  the  discretized  equa¬ 
tions  from  a  claim  for  the  cases  of  the  continuous  equations  and  the  actual  phy¬ 
sics.  Moreover,  the  aspect  ratio  A  =  depth/width  enters  as  a  second  bifurca¬ 
tion  parameter  of  considerable  importance.  That  is,  holding  Re  *  10,000  as 
studied  in  [7],  from  [6,7]  there  is  indicated  a  Hopf  bifurcation  at  some  critical  A 
strictly  between  1  and  2.  A  recent  study  [10]  found  no  periodic  solution  for  flow 
at  Re  =  10,000  in  a  unit  (A  =  1)  cavity. 

In  addition  to  the  pronounced  oscillation  on  the  left,  we  noted  also  very  small 
tertiary  nonstationarities  at  three  points  along  the  right  wall:  (i)  just  below  the 
upper  right  corner,  where  there  earlier  was  a  definite  tertiary  eddy;  (ii)  just 
below  the  right  midwall,  where  the  fluid  separates  into  the  upper  and  lower 
regions;  (iii)  at  the  top  of  the  lower  right  corner  eddy,  see  Figure  3.  Whether  or 
not  these  represent  minor  numerical  or  fluid  instabilities  or  very  small  tertiary 
features  of  a  periodic  final  solution  is  a  further  interesting  question.  One 
would  imagine  a  first  Hopf  bifurcation  at  some  Rej  in  which  the  flow  would 
settle  into  an  oscillation  tracking  the  principal  separation  point  movement  on 
the  left  wall,  and  then  higher  critical  Re2,  Re3,  Re4,  at  which  one  or  more  of 
the  just  mentioned  three  tertiary  features  may  maintain  itself  into  the  final 
state. 


5.  SOME  NUMERICAL  CONSIDERATIONS 

Due  to  the  multidirectional  nature  of  the  unsteady  flow  dynamics  in  a  cavity, 
any  incorporation  of  upwinding  could  result  in  a  computational  quagmire  and 
would  likely  distort  the  unsteady  flow  transient  details.  On  the  other  hand,  in 
employing  a  forward  Euler-MAC  scheme  as  we  did  in  (6,7),  care  is  needed  in 
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choosing  the  discrete  steps  £t,  5x,  and  fy  to  assure  both  stable  time  integra¬ 
tion  and  adequate  spatial  resolution  of  the  dynamics  under  consideration.  We 
required  that  the  ratio  4(6t)i//(6x)2  not  exceed  a  critical  value  K(i'),  where  we 
have  taken  8x  —  Sy.  Because  the  equations  are  nonlinear,  K(u)  must  in  gen¬ 
eral  be  determined  experimentally  by  simulation  runs  on  a  coarse  grid.  For 
example  the  value  i/=*0.01  yielded  K(i/)  1.0,  while  v  =  0.0025  yielded 

K(i/)  »  0.36.  Our  experiments  with  K(t/)  indicated  that  K(u)  — ►  0  as  u  — ►  0. 
in  a  nonlinear  way.  Instability,  when  it  occurred,  manifested  itself  first  in  the 
downstream  left  lid  corner,  resulting  quickly  in  a  subsequent  dissolution  of  the 
primary  vortex  accompanied  by  a  rapid  buildup  of  large  pressure  gradients.  A 
well  established  primary  vortex  appeared  to  guarantee  stability  thereafter. 


Proper  resolution  of  boundary  layer  effects  at  high  Reynolds  numbers  requires  in 
general  very  fine  discretizations.  However,  the  flow  velocities  near  the  midwall 
and  corner  vortex  structures  are  considerably  smaller  than  tht  driving  velocity. 

For  Re  =  —  =  10,000,  at  the  final  time  t  =  360  seconds  the  effective  local 
v 

Reynold’s  number  in  the  left  midwall  region  was  found  to  be  about 


(Re) 


effective 


U^'L 

v 


iMKll  _  10(X) . 

(io-4) 


Examination  of  the  velocity  matrices  during  the  whole  time  history  revealed 
corresponding  or  smaller  values  of  (Re)effective  in  the  neighborhoods  of  all 
secondary  vortex  structures.  In  particular,  the  transient  wall  vortex  separation 
dynamics,  i.e..  bursting  and  separation  point  movement  on  the  left  wall,  would 
appear  to  be  accurately  resolved. 


To  obtain  the  very  high  resolution  steady  Stokes  flow  corner  subvortices  (20  of 
them  thus  far)  found  in  [4,5],  care  to  avoid  underflow  and  to  maintain  good  sub- 
domain  residuals  required  several  innovations  on  the  FAC  multigrid  schemes. 
Roughly,  the  additional  considerations  arise  in  accurately  passing  information 
back  and  forth  from  local  to  global  grids  when  treating  a  problem  which  is 
biharmonic  rather  than  harmonic.  Challenging  and  interesting  investigations  of 
several  proposed  N-Processor  (e.g.,  Hypercube  configuration)  algorithms  for 
simultaneous  subdomain  computation  present  themselves  in  this  context, 
although  we  have  not  yet  launched  such  a  study. 


6.  OTHER  GEOMETRIES  AND  APPLICATIONS 

In  another  paper  presented  at  this  conference,  Ghoniem  (1 1]  reports  similar  stu¬ 
dies  by  a  different  method  on  another  model,  that  of  flow  over  a  backward  fac¬ 
ing  step.  From  our  discussions  at  this  conference,  it  would  appear  that  our 
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results  and  those  of  Ghoniem  et  al.  [11,12,13]  support  and  indeed  reinforce  the 
conclusions  found  separately  in  each  of  the  two  geometries. 

Vortex  splitting  dynamics  with  properties  very  consistent  with  those  that  we 
have  found  have  been  recently  observed  physically  in  unsteady  flow  over  airfoils 
[14].  Our  methods  could  be  applied  to  those  and  other  geometries  (e.g.,  corners 
of  arbitrary  angle,  and  effects,  general  obstructions)  by  means  of  new  high  reso¬ 
lution  grid  generation  mapping  schemes  utilizing  the  same  elliptic  solver  tech¬ 
niques  described  herein.  Other  future  work  includes  a  better  understanding  of 
the  mechanisms  under  which  higher  multiplicities  and  cell  boundary  distortions 
develop  in  a  flow,  and  in  particular  how  such  dynamical  features  depend  funda¬ 
mentally  on  the  new  parity  rules  that  we  have  reported  here. 
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INCIPIENT  SINGULARITIES  IN  THE  MVIER-STOKES  EQUATIONS 


Alain  Pumir  &  Eric  D.  Sigga 
Laboratory  of  Atomic  and  Solid  States  Physics 
Cornell  University 
Ithaca,  NY  14853 


In  this  paper,  we  examine  one  of  the  challenging  phenomena  in 
3-dimensional  incompressible  fluid  dynamics,  namely  the  streching  of  the 
theoretical  difficulties.  Mathematically,  it  has  proven  to  be  impossible 
so  far  to  show  that  the  solutions  of  the  3-dimensional  Navier-Stokes 
equations  do  not  blow-up  when  the  viscosity  is  very  small,  even  in  the 
absence  of  any  external  forces.  In  this  respect,  the  2-dimensional 
situation  is  simpler,  due  precisely  to  the  absence  of  any  vorticity 
streching.  We  report  here  several  results,  which  suggest  that  solutions 
to  the  fluid  equations  can  get  close  to  a  finite  time  singularity.  Our 
constructions  proceed  as  follows.  In  a  first  part,  we  study  several 
models  for  the  evolution  of  vortex  filaments,  in  an  inviscid  fluid.  The 
results  suggest  that  vorticity  blows-up  in  a  finite  time.  We  then  try 
to  reconstruct  from  the  filaments  solutions  a  solution  of  the  Euler  equations, 
by  using  asymptotic  techniques.  Finally,  we  consider  the  role  of  viscosity. 

It  is  argued  that  viscosity  is  barely  able  to  prevent  the  collapse. 

A  reasonable  model  for  the  evolution  of  a  slender  vortex  filament 
describes  a  vortex  tube  by  a  curve,  with  an  internal  degree  of  freedom, 
the  core,  size.  The  evolution  equation  reads 
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(1) 


_r 

4tt 


dr(0,t) 

dt 


I”  ^i9  ^  x  (r(0,t)  -  r(o',t)>  do' 
}  4 ( r ( Q )  -  r(0'))2  +  a( 0)2  +  a(o')2)3/2 


(2)  |a2ds/de|  =  cst 

Where  c  measures  the  core-size  of  the  slender  tube,  and  acts  as  a  cutoff. 
Equation  (1)  can  be  rigorously  deducted  in  the  limit  of  an  infinitely  thin 

tube  (namely,  a  «  rc,  where  rc  is  the  radius  of  curvature  of  the  filament). 
In  what  follows,  it  will  be  referred  to  as  the  Biot-Savart  equation.  0 
denotes  a  Lagrangian  parameter,  and  s  is  the  arclength  along  the  curve. 

The  equation  for  the  core-size,  (2),  insures  the  local  conservation  of 
volume  of  the  vortex  tube.  We  emphasize  that  it  is  an  approximation, 
applying  when  the  time  scale  of  the  evolution  of  the  vortex  filament  is 
large  compared  with  the  time  scale  of  the  internal  dynamics  of  the  core. 

With  these  assumptions,  the  vorticity  scale  is  l/o2. 

Starting  from  a  simple,  non  planar  curve,  a  pairing  occurs  and  leads  to 
a  smooth  curve  composed  of  two  anti-parallel  filaments^1).  This  new 
curve  stretches  very  rapidly  and  generates  smaller  and  smaller  scales. 
Typically,  two  anti-parallel  arcs  of  filament  grow  like  two  anti-parallel 
expanding  circles,  until  an  instability  develops,  and  folds  the  curve  into 
smaller  pieces.  These  smaller  pieces  start  stretching,  and  the  process 
keeps  repeating  itself.  The  radius  of  curvature  is  always  significantly 
larger  than  the  core  size,  and  the  distance  between  the  two  filaments 
remains  of  the  same  order  as  the  core-size.  The  minimum  of  the  square  of 
the  core  size  decreases  according  a  linear  law:  o2  =  (t*-t),  suggesting  a 


1154 


divergence  of  the  vorticity  like  l/(t*-t)  (with  possibly  logarithmic 
corrections)^).  This  result  also  follows  from  naive  dimensional  analyse 
of  the  vorticity  equation.  Various  local  approximations  to  the  Biot-Savart 
formula  have  been  derived  and  suggest  that  the  singularity  found  for 
Biot-Savart  is  very  robust,  at  least  within  a  filament  approximation. 

The  fact  that,  in  all  numerical  simulations,  the  ratio  e=o/rc  is  small 
allows  us  to  look  for  solutions  of  the  Euler  equations  by  an  asymptotic 
expansion,  in  powers  of  e.  We  observe  that  the  two  paired  filaments  look 
locally  like  a  2-dimensional  dipole.  Such  dipoles  are  known  to  be  very 
stable.  The  zeroth  order  solution  in  powers  of  e  is  constructed  with  an 
exact  dipole  solution  of  the  2-D  Euler  equations.  Various  corrections 
come  in  at  higher  order.  Our  asymptotic  formalism  allows  us  to  recover  the 
equations  resulting  from  local  approximation  to  the  Biot-Savart  equations. 
However,  we  are  unable  to  rule  out  a  systematic  deformation  of  the  basic 
2-D  dipole,  that  would  make  the  filament  approximations  inappropriate. 

Even  so  the  reason  why  the  core  could  be  destroyed  is  rather  subtle. 

If  the  collapse  does  proceed  at  least  partway,  according  to  the 
Biot-Savart  approximation  for  an  inviscid  fluid,  it  is  not  obvious  that 
the  viscous  effects  can  prevent  it^).  The  viscosity,  which  is  commonly 
believed  to  wipe  out  any  perturbation  smaller  than  a  dissipation  scale  can 
play  a  subtler  role  in  this  problem.  A  naive  expression  for  the  core  size 
in  the  presence  of  viscosity  is: 

(3)  do^/dt  =  v-o2din(ds/d0)/dt 

The  first  term  in  the  right-hand  side  of  (3)  represents  fattening  of 
the  core  due  to  viscous  diffusion.  The  other  term  is  due  to  the  stretching; 
a  crude  estimate  of  it  leads  to:  <j2dtn(ds/d0)/dt  *  rxf(o,rc),  where  f  is  a 
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dimensionless  function  of  the  characteristic  lengths,  o  and  rc>  say.  As 
v  and  r  have  the  same  dimension  and  the  ratios  between  the  different  lengths 
do  not  vary  much,  (3)  can  be  approximated  by  da2/dt  =  v-r  x  cst.  On  the 
basis  of  this  naive  argument,  the  viscosity  may  not  be  able  to  prevent 
the  collapse.  A  more  rigorous  analysis  of  the  strained  2-dimensional 
Navier-Stokes  equations  confirms  that  the  usual  Laplacian  dissipation  is 
only  marginally  able  to  control  the  stretching. 

A  complementary  numerical  simulation  of  the  full  fluid  equations  allows 
one  to  identify  different  ways  the  core  can  be  destroyed.  When  e  is  not 
small  enough  shocks  may  appear  on  the  tube,  or  the  core  can  be  deformed 
into  thin  sheets,  that  roll-up.  Such  mechanisms  are  the  most  likely 
impediments  to  the  collapse^). 

In  any  case  we  have  worked  out  a  situation  where  some  strong  stretching 
effects  do  occur.  Whether  or  not  the  stretching  leads  to  a  singularity 
has  not  been  proven  by  our  study.  The  relevance  of  our  particular  flow  to 
the  experimental  situation  is  rather  unclear.  This  study,  however,  sheds 
some  light  on  an  important  feature  of  3-dimensional,  incompressible 
tydrodynamics. 
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Abstract.  The  Initial-boundary  value  problem  for  a  reaction-diffusion 
equation 

(*)  u^  ■  u  -G(L,u),  0  <  x  <  1,  u(0,t)  ■  u(l,t)  -  0,  t  >  0, 

G(L,u)  -  4L^u(u-b)(u-l) ,  0  <  b  <  1/2, 

was  recently  analyzed  by  Conley  and  Smoller.  We  study  the  large  time  behavior 
of  the  semidiscrete  finite  element  approximations,  with  interpolation  of  the 
coefficients  in  the  nonlinear  terms.  In  Part  l  we  have  established  that  the 
properties  of  the  semidiscrete  approximations  are  completely  analogous  to 
those  of  the  solutions  of  (*),  and  the  asymptotic,  as  t  ♦  •,  optimal  order 
convergence  has  been  proved.  In  this  paper  we  approximate  the  "spontaneous 
bifurcation  (with  L  as  a  parameter)  for  the  steady-state  problem.  For  the 
semi-discrete  approximations  of  (*)  we  establish  error  estimates  that  hold 
uniformly  on  the  infinite  time  Interval  (t^,*),  0  >  t^,  for  nonsmooth  or 
incompatible  Initial  data.  J 
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1.  Introduction 


We  are  Interested  in  numerical  analysis  of  the  nonlinear  reaction- 
diffusion  equation 


ut  ■  “x,  -C(L-u)*  0  <  x  <  *•  1  >  °.  <!•••»> 

u(x,0)  -  it  (x),  0  <  x  <  I,  ( 1 , 1 . b) 

u(0,t)  -  u(l,t)  ■  0,  t  >  0,  (1.1. c) 

which  Is  a  paradigm  example  of  nonlinear  phenomena  such  as  multiple  steady 
state  solutions  and  bifurcation  In  mathematical  biology. 

The  corresponding  steady-state  problem  Is 

U„  -CU.u)  -  0,  0<X<1.  (1.2..) 

u(0)  •  u(l)  -  0.  ( 1 . 2 . b ) 

Here 

G(L,u)  -  4L2u(u-b)(u-l),  0  <  b  <  1/2.  (1.3) 

The  contiguous  problem  was  analyzed  by  Conley  and  Smoller  and  (2]  and  Smoller 
1 10).  Their  result  is 

Theorem  1,1  (10,  Theorem  24.13).  Let  G  be  defined  by  (1.3),  and  let 

L  >  Lq.  Then  there  are  exactly  three  steady-state  solutions 
ut  €  C*,  1  -  0,  1,  2  of  (1.1. a),  (1.1. c): 

0  =  Uq(x)  <  Uj(x)  <  UjM  <  1,  | x |<  1 ;  u^  and  Uj  are  attractors  for  (1.1. a), 
(l.l.c);  and  the  linearized  operators  Qq  and  Q2,  where 


together  with  the  boundary  conditions  (1.2.b),  have  only  negative 
eigenvalues.  Qj  has  precisely  one  positive  eigenvalue,  and  Uj  has  a  1- 
dlmenslonal  unstable  manif-id  which  consists  of  orbits  connecting  u^  to  each 
of  the  other  rest  points.  Initial  data  u(x,0)  which  satisfies 
u^(x)  <  u(x,0)  <  u 2 ( x)  (resp.  0  <  u(x,0)  <  u^(x))  on  |x|  <  l  is  in  the 
stable  manifold  of  U2  (resp.  0). 

The  problem  (1.1)  is  discretized  in  space  [4)  by  the  finite  element 

method  with  the  Interpolation  of  coefficients  for  the  evaluation  of  nonlinear 

terms,  the  so  called  "product  approximation". 

For  I  -  (0,1)  let  H8  -  HS(I),  h||  -  H*  D  H8,  for  s  real,  and 
00  00 

L  -  L  (I)  be  the  usual  Sobolev  spaces  with  the  norms  l«l  and 

I • Iq  mt  respectively;  if  s  -  0,  we  write  ■ •  I q  *  1*1. 

Let  A  «  {0  ■  Xq  <  Xj  <  . . .  ;N+1  -  1}  be  a  partition  of 

I  -  (0,1).  Set  Ij  -  hj  ■  xj  J  *  l»  •••*  N  +  1  and 

h  *  max  h  .  It  is  assumed  that  as  the  meshes  vary  they  are  quasi-uniform, 

1 <j  <N+1  J 


i.e. 

by 


max  h  h  j  <  o, 
l<j<N-fl 


for  some 


o  >  1. 


Define  the  finite  element  space 


Sq  -  Mq(A)=  {v  €  C°( I)  :  v(0)  -  v(l)  -  0,  v  €  P^),  J  -  1.  l)  , 

where  for  any  Interval  E  ci,  P^(E)  denotes  the  space  of  polynomials  of 
degree  <  k  restricted  to  E.  Every  Interval  Xj  -  1 ,  . . . ,  N  ♦  1  is  divided 
into  k  subintervals  lxj-i+(i-i)/k»xj-i+i/k^  *  *  *  •••»  k,  where  the 

nodes  within  each  Ij  are  chosen  to  be  the  Gauss-Lobatto  points.  We  relabel 

*j+i/k  88  xkj+i »  J  ■  °.  l»  •••*  N  and  8et  M  “  (N+l)k-l • 

Let  :  g  €  ♦  Q^g  €  sjj  be  the  usual  interpolation  operator  defined 


by 
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M  +  l 


<Qhg)<xi >  ■  8<x1>.  1  "  0,  .... 


Then  the  continuous  time  GaLerkin  approximation  with  the  product  approximation 
uh(x,t)  to  the  solution  of  (l.l)  Is  defined  as  a  differentiable  map 
uh  :  (0,~)  ♦  Sq  such  that 

(uJ.X>  ■  -(uJj.Xx)“(QhG(l.,uh),x).  t  >  0,  x€  sj, 


(1.4) 


u^(»,0)  given  In  S~. 


The  corresponding  approximate  steady-state  problem  Is  to  find  u*1  6  sl?  from 


(ux.xx)  +  (QhcO*»“h)»x)  ■  X  f  sj. 


J0 

(1.5) 


If  ti^Cx),  1  -  1,  M,  Is  the  usual  lnterpolotory  basis  for  SQ,  then  to 

compute  the  solutions  of  (1.4)  and  (1.5)  one  has  to  solve:  the  problems 
M  M  M 

l  u.  (♦.,♦.)  “  ~  l  «.(♦',♦')  -  l  G(L,u  f)(iJ>  ,ij»  ),  J  ■  1,  ...,  M, 

1-1  1  1  J  1-1  1  1  J  1-1  J 

(1.6) 

u1(0),  1*1,  ...»  M  given. 

For  1*0,  1,  2  the  linearized  discrete  eigenvalue  problems  are  to  find 
vh  €  Sq  from 

-«£<v\x>  ^  <vj;.xx>  ♦  (»i*h.x>  - 


(1.7) 


X(h)(vh,x),  aj(x)  -  QhGu(L,uJ(x)) ,  x  €  SJ. 


In  [4]  we  have  established  the  following  result. 


Theorem  1.2.  Let  G  be  defined  by  (1.3)  and  L  >  L^.  Then  there  exists 
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hg  >  0  such  that  for  h  <  h^ 

(1)  The  approximate  steady-state  problem  (1.5)  has  exactly  three 
solutions  uj1  €  Sq,  i  -  0,  l.  2,  satisfying  0  =  ujj  <  Uj  <  u^  <  l  ♦  Chk  and 

<  Chr  i  -  1,  2,  r  -  1,  k.  (1.8) 

(ii)  uj!  and  u!j  are  the  attractors,  and  the  linearized  problem  (1.7)  has 
only  negative  eigenvalues  for  1*0,  2;  has  t-dimenslonal  unstable 

manifold  which  consists  of  orbits  connecting  to  each  of  the  other  rest 

points,  and  the  linearized  problem  (1.7)  has  precisely  one  positive  eigenvalue 
and  the  rest  negative  for  i  ■  l. 

(ill)  Let  u(x,t)  and  u^(x,t)  be  the  solutions  of  (1.1)  and  (1.4), 
respectively.  Initial  data  which  satisfies  u^(x)  <  uh(x,0)  is  in  the  stable 
manifold  of  u!j,  and  there  holds: 

lim  l u( • , t )  -uh(«,t)lj  <  chr  *u2*r+2*  r  “  •••»  C 1 »9) 

t+« 

Initial  data  which  satisfies  uNx,0)  <  u^(x)  is  in  the  stable  manifold  of 
ujjiO,  and  there  holds: 

lim  lu*1(»,t)l.  ■  0.  (1.10) 

t+»  1 

In  [10]  the  ’'spontaneous"  bifurcation  (i.e.,  the  bifurcation  whereby  the 
solution  suddenly  "appears"  when  a  parameter,  in  our  case  the  length  L  of 
the  Interval,  crosses  a  certain  critical  value)  was  analyzed  for  the  steady- 
state  problem  (1.2).  For  the  practical  solution  of  our  problem  we  can  first 
find  the  stable  steady-state  solution  u^  as  a  limit  as  t  ♦  •  of  an 
appropriate  solution  of  the  time  dependent  problem.  Ue  then  obtain  the 
unstable  steady  state  solution  u^  by  following  the  bifurcation  diagram. 
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The  outline  of  the  rest  of  the  paper  is  as  follows.  In  Section  2  we 
establish  error  estimates  In  the  numerical  approximation  of  the  spontaneous 
bifurcation  using  general  methods  of  approximation  of  nonlinear  problems  in 
Brezzl,  Rappaz,  and  Raviart  (1]. 

In  Section  3  we  establish  large  time  error  estimates  in  the  semi-discrete 
solution  extending  the  results  in  Khalsa  [5]  to  take  into  account  the 
numerical  integration. 

Recently  in  Manoranjan  and  Mitchell  [9]  numerical  studies  of  the  problem 
(  1 . 1  .a)-( 1 . 1 ,c)  were  carried  out,  using  the  finite  element  discretization  with 
the  product  appproxlmation,  and  estimates  for  the  critical  length  Lq  were 
obtained.  In  Larsson  [8]  error  estimates  on  an  infinite  time  Interval  in  the 
case  of  convex  f  were  derived  for  a  seal  linear  parabolic  problem  using 
piecewise  linear  finite  elements.  The  results  in  [8]  have  been  extended  [5) 
to  the  case  when  the  solution  being  approximated  is  asymptotically  stable, 
instead  of  the  convexity  assumption  on  f,  initial  data  is  nonsmooth  or 
incompatible,  piecewise  polynomial  finite  elements  being  used.  See  [5]  for 
other  references  and  discussion. 

2.  Approximation  of  the  apootaoaous  blfmrcatloo. 

Following  [10]  we  introduce  a  parameter 

8  -  u( i/2)  , 

where  u  is  a  solution  of  (1.2) 

Lemma  2.1  [10,  pp.  185-190]  (i)  There  exists  a  unique  branch 

{L(8),u(8);  0  <  8  <  1}  of  solutions  of  (1.2)  with 

L"(8)  <  0,  0  <  8  <  1.  (2.1) 
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with 


(II)  There  exists  a  unique  0q  €  (0,1) 

L* (8q)  -  0.  (2.2) 

(III)  Set  Lq  *  L(0q).  Then  (1.2)  has  one  solution  Uq  =  0  for  L  <  Lq, 
two  solutions  Uq  and  u^  for  L  *  Lq  and  three  solutions  Uq,  Uj  and  U2 
for  L  >  Lq. 

We  next  rewrite  the  problems  (1.2)  and  (1.5)  in  the  operator  form.  If 
T  :  Hq  ♦  Hq  and  Th  :  Hq  ♦  Sq  are  the  continuous  linear  operators  defined, 
for  f  €  Hq,  by 

Tf  «  u  €  hJ  if  (ux,vx)  -  (f ,v),  V  v  €  hJ, 

and 

Thf  -  uh  €  Sq  if  (ux,Xx)  -  (Qhf.x).Yx  €  sj, 

then  (1.2)  and  (1.5)  are,  respectively,  equivalent  to  the  problems  to  find  the 
pairs  (L,u)  in  R  x  Hq  and  (L,u^)  in  R  *  Sq  such  that 

F(L,u)  =  u  +  TG(L,u)  -  0,  (2.3) 

Fh(L,uh)  =  uh  ♦  ThG(L,uh)  -  0.  (2.4) 

Lemma  2.2 

(i)  F°  =  F(Lq,u1)  -  0 

(ii)  DuF°  i  DuF(L0,Ul)  -  I  +  TDuG(L0,Ul)  €  T(Hq,hJ)  ,  is  singular  and  -1  is 

an  eigenvalue  of  the  compact  operator  TDuG(Lq,Uq)  with  algebraic 
multiplicity  1. 


f 
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(ill)  there  exists  a  unique  branch  {(L(a),u(a));  |a|  <  aQ,  for  some 

Oq  >  0}  of  solutions  of  (2.3)  in  a  neighborhood  of  the  point 

2 

(Lq.Uj),  where  a  ♦  L(a)  and  a  ♦  u(a)  are  C  functions  given  by 


L(o)  -  Lq  +  5(a), 


(2.5) 


u(o)  -  uQ  +  n(a)*0  ♦  v(5(a) ,a) . 
Here  5(0)  -  n(0)  -  v(0,0)  -  0, 

H*  -  Ker(DuF°)  ©  Range  (D^0)  s  Vj  ©  V2, 


♦o  €  V  *Vi  -  >•  v  €V 


2* 


(iv)  (Lq,Uj)  is  a  turning  point,  in  particular,  5(a)  has  a  local  minimum 
at  a  ■  0  and 


5'(0)  -  0,  5"(0)  >  0.  (2.6) 

Proof,  (i)  follows  from  (2.3)  and  Lemma  2.1.  Next  set 

a  ■  B  -  Bq. 

Then  the  solution  branch  (L(a),  u(a)^  -Bg  <  a  <  1-flg)  of  (1.2)  provided  by 
Lemma  2.1  is  clearly  a  smooth  solution  branch  of  (2.3)  passing  through 
(Lg.Uj)  at  a  •  0.  Differentiating  (2.3)  along  this  branch  we  find  that 

0  ■  di  F° '  VW  £  <°>  +  WV  <0)-  (2-7) 

Since  by  (2.2)  ^  (0)  -  0,  DyF^  is  singular,  and  -1  is  an  eigenvalue  of  the 

compact  operator  TDuG(Lg,u^).  Let  ^  be  a  corresponding  eigenvector. 

Since  all  the  eigenvectors  of  TDuG(Lq,Uj)  are  smooth  we  also  have 
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*♦  5  ♦"  +  DuG(W  ♦  “  °* 

♦(0)  -  $(1)  -  0. 

Since  all  eigenvalues  of  £  are  simple,  the  corresponding  eigenvalues  of 
TDuG(Lq,u^)  are  also  simple,  and  we  arrive  at  (11). 

The  part  (ill)  follows  from  (i),  (ii)  and  the  classical  theory  of  compact 
operators.  Finally,  (iv)  follows  from  (ill),  (2.1)  and  (2.2). 

Using  the  Degree  Theory  [7]  one  can  obtain  from  the  above  Lemmas 
existence,  uniqueness  and  convergence  of  a  branch  {(L^(a),u^(a)) ;  |a  j  <  a^} 
of  solutions  of  (2.4),.  To  establish  the  optimal  rate  of  convergence  we  make 
an  additional  assumption  (which  can  be  verified  numerically). 

Theorem  2.1  (i)  Assume  that  (Lq,Uj)  is  a  simple  limit  point  of  F,  l.e. 

DlF°  ^  V2  or,  equivalently,  (D^F0,^)  *  0,  where  ^  V1  and  V2  are  aa  ln 
Lemma  2.2.  Then  there  exists  a  unique  branch  {(Lh(a),uh(a)) ;  |a  |  <  Oq}  of 
solutions  of  (2.4)  (or  (1.5))  in  the  neighborhood  of  the  branch 
{(L(a),u(o);  |a|  <Oq).  This  branch  is  of  class  C**,  and  for  all 
a  €  [ -cXq  ,ag]  and  all  Integer  m  >  0  we  have  the  error  estimate 

|L(")(a)  -  L(m)(a)|  +  l(uh)(B)(a)-u<n)(o)» { 

(2.8) 

<  C  hr  l  lu^(a)l  ,,  r  -  1,  ...,  k. 
t-0  r  * 

(11)  There  exists  a  unique  nondegenerate  turning  point  (LG,uJ)  of  F^  ln 
a  sufficiently  small  neighborhood  of  (Lq,u^),  and  we  have  the  error 
estimate 

I Lh_Lo I  +  luV”ui 1 1  <  c  hr  l  lu^^(a)lr+2»  r  -  1,  ....  k^2.9) 

4*0 
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Proof.  In  (4]  we  have  proved  that 


l(T-Th)gl1  <  c  hr!glr,  r  -  I,  ...,  k,  V  g  €  Hr.  (2.10) 

By  (l)-(ill)  of  Lemma  2.2  and  (2.10)  the  assumptions  of  Theorem  3  in  [l]  are 
satisfied,  and  we  have  the  existence  of  a  unique  branch 
{Lh(a),uh(a));  |a|  <  ()  and  the  error  estimate 

|L^m)(a)-L(m)(a)|  +  I (uh)(n)(a)-u(,B)(a)l  { 

<c  l  (2.11) 

1*0  da 

Hence  (2.8)  follows  from  (2.10),  (2.11)  and  (1.2).  Using  also  (2.6),  by  [1, 
Section  4]  there  exists  a  unique  nondegnerate  turning  point  (L^,Uj)  of  Fh 
in  a  sufficiently  small  neighborhood  of  (L^,u^ ),  and  by  [1,  Theorem  4]  we 
have  the  error  estimate 

n  h  1  A 

I  Si0 1  +  'VVl  <  c  l  1  (T-Th)  S-rG(L(a),u(o))  |  I  .(2. 12) 

1-0  da 

Using  again  (2.10)  and  (1.2)  gives  (2.9). 

Remark.  Results  in  (6]  imply  optimal  order  error  estimates  in  (2.8)  and  (2.9) 
when  the  H*  norm  of  the  error  is  replaced  by  the  L?  or  L  norm. 

3.  Large  time  error  estlsstM  for  the  parabolic  problem. 

In  this  section  we  derive  large  time  error  estimates  for  (1.4)  where  the 
initial  data  is  in  the  domain  of  attraction  of  a  stable  steady-state 
solution  u*,  where  u*  is  u0  or  u2»  Our  approach  is  a  modification  of 
the  one  in  (5]  where  similar  results  have  been  proved  without  considering 
numerical  Integration. 

We  define  the  linearized  operator  L  and  the  corresponding  bilinear  form 
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by  setting 


Av  *  -  v"  -  g'(u  (x))v, 

A(v,w)  -  (v' ,w* )-(g' (u  )v,w), 

where  g(u)  =  -G(L,u).  We  assume  that  for  some  A  ^  >  0  there  holds 

Ajlvl^  <  A(v,v),  V  v  €  Hq.  (3.1) 

2  A 

Lemma  3. 1  (5)  (i)  There  exists  a  neighborhood  N  c  Hq  of  u  and  a  with 

0  <  a  <  Aj  such  that  If  u  Is  a  solution  of  (1.1)  with  u(tQ)  €  N,  t^  >  0, 
2 

then  u  (?  Hq  exists  for  all  t  >  0  and  we  have 

*  ”®(^”^q)  * 

lu( t)  -  u  l2  <  c  e  IuUq)  -  u  l2,  t  >  tQ.  (3.2) 

(11)  There  exists  c  -  c(m)  >  0  such  that 


! 


1 

w(t)  -  /  g’(u  +  s(u.-u))ds. 
0 

We  also  set 


Ng(u)  -  {v  €  L^: 

1 u— v 1 L  <  6)  . 

Lemma  3.2  For  any 

e  >  0  there  exists 

6  -  6(e)  >  0 

such 

that  for 

* 

u  €  Ng/2^u  )  an<* 

uh  ^  N6/2^u^  we  ^ave 

uh  €  N6(u*} 

and 

1  w(t)l ,  <  e . 

00 

Proof.  Obviously, 

the  lemma  holds  with 

6  “  2lg"l L  * 

♦ 

Set 

e(t)  -  %(*) 

-  u(t). 

Lemma  3.3.  Let  t^ 

be  as  In  Lemma  3.1. 

Suppose  that 

for  given  e  with 

* 

0  <  e  <  Aj  we  can  find  t^  ■  t^e)  >  such  that  for  some  t  >  t,  the 

semidiscrete  problem  (1.4)  has  a  solution  u^t),  0  <  t  <  t*,  and  for 
* 

tj  <  t  <  t  Lemma  3.2  holds. 

Then  there  exist  y  -  Y(e)  >  0  and  h^  >  0  such  that  for  h  <  hQ 

-y(t-tj)  jj+i  * 

le(t)l  <  Cj(e  1  le(t^)l+h  ),  tj  <  t  <  t  ,  (3.6) 

-1/2  * 

le(t)lL  <  c2h  le(t)l,  tj  <  t  <  t  ,  (3.7) 

where  the  constant  c^  and  C£  do  not  depend  on  h  and  t*. 

Proof.  Following  a  standard  procedure  we  set 

e  -  “h-u  “  (u^-P^u)  ♦  (PjU-u)  ■  0  +  p  , 

where  :  H*  ♦  Is  the  elliptic  projection  defined  by 

((P1v-v)x,xx)  -  0,Vx  €  Sq.  It  Is  well  known  that 
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IP^v-vlj  +  h  *  DPjV-vl  <  c  h8  lvlg,  Vv  €  1  <  s  <  k.  (3.8) 

From  (1.1)  and  (1.4)  we  have  the  equation  for  6: 

(0t.x)  -  -(ex,xx>  ♦  (Qhg(uh>“g(u),x)  ~(p t »X ) •  0-9) 

with  g  =  -G(L,u).  Next  write 

QhG(uh)  “g(u)  *  Q^g^l-g^))  -  (I-Qh)g(u). 

Using  (3.5), 

Qh(g(uh)”g^u))  "  Qhw(0+P)  "  QhH>  +  Qh(w*g'(u*)))0 
-  Qhwp  +  Qhw0  -(I-Qh)g'(u*)e  +  g(u*)0 . 

And  thus  (3.9)  is  written  as 

(flt»x>  ♦  (ex»xx)-(g'(u*)8tx) 

(3.10) 

■  (Qhw0+Qh*V-(I-Qh)(g(u)  +  g'(u*))-pt,x) . 

Taking  x  *  9, 

I012  +  «0xl2-(g,(u*)0,0)-(Qh^0,0) 

■  (Qh^>-d-Qh)(g(u)  +  g'(u*))  -pt,e) 

Using  (3.1).  this  reduces  to 

i0i  4-  lot  +  x.iei2  -  iq. wi,  lei 2 

dt  1  n  L 

m 

<  (iQhW'l  op1  ♦  »Pt»  -n(i-Qh)(g(u) 
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+  g'(u  ))i)  lei 


Since  IQ^I^  *  1,  lwlL  <  C  and  by  Lemma  3.2  I  wl  ^  <  e,  we  have,  setting 

mm  m 

Y  -  Xj  -  e, 


(eYt»e»)  <  c  eYt(lpi+ipti 
+  l(I-Qh)g(u)  +  g'(u  ))l  =  c  eY%(t), 


or  after  integration 


t  _  v 

16(c) I  <  c  e  I0(t  I  +  c  /  eWYU"8;^(s)ds.  (3.11) 

*1 

Hence  we  conclude  that  altogether 

-Y(t-t.) 

Ie(t)l  <  I6(t)l  +  Ip (t ) I  <  e  le(t  )l  +  c  sup*(s) 

t  j<  8<  t 

Taking  into  account  (3.8)  and  the  fact  that  (3.8)  also  holds  for  Pj  replaced 

by  Qh> 

-y(t-t  )  . 

le(t)l  <  e  le(t.)l  +  c  h  max  (lu(s)l.  ,  +  lu  (a)l.  .).  (3.12) 

1  1  1  K  1 


Finally,  estimating  u  and  ut  by  Lemma  3.1,  we  arrive  at  (3.6).  By  using 
the  approximation  properties  of  S  and  an  inverse  inequality  (see  e.g.  [5]), 
we  also  obtain  (3.7). 

From  Lemma  3.3  we  obtain  in  the  same  way  as  in  (3) 


Theorem  3.1.  There  exist  t^  >  0,  y  >  0  and  h^  >  0  such  that  for  h  <  hQ 

-Y(t-t.)  .  . 

Iuh(t)-u(t)l  <  c(t  »uh(t1)-u(t1)l  +  n  *),  t  <  tj.  (3.13) 


The  order  of  convergence  for  u^(t  ^  )-u(tj)  depends  on  smoothness  of 
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u( t)  on  (0,t^|,  which  in  turn  depends  on  the  compatibility  and  smoothness 
of  the  initial  data  v.  In  particular,  for  v€h^,  a  >  5/2  (in  the  absence 
of  nonlocal  compatibility  renditions  on  v)  we  have  (3,8)  for  any  >  0 

-l+(5/2-e  -8) 

lut(8)«B  <  c  s  .  (3.14) 

By  a  slight  modification,  to  Incorporate  the  numerical  integration,  of  a 
standard  procedure  we  can  obtain  the  estimate 

g  t 

lu  (t)-u(t)l  <  c  lv  -vl  +  c  hp  { I  vl  a  +  /  lu. I.ds} ,  0  <  t  <  t.  ,  I  <  0  <  k  +  1. 

h  h  p  q  t  fs  l 

(3.15) 

In  order  for  the  Integral  in  (3.15)  to  converge  we  must  have 
8  <  5/2-e,  e  >  0.  We  thus  obtain  from  Theorem  3.1 

Corollary.  Let  v  €  h“,  a  >  5/2.  Then  for  some  >  0,  y  >  0,  hQ  >  0,  any 
e  >  0,  h  <  hQ  we  have 

Iuh(t)-u(t)l  <  c  (e  1  h3'4  e  +  hK  *),  t  >  tr  (3.16) 

Remark.  By  a  modification,  to  incorporate  numerical  Integration,  of  the 
argument  In  [5]  one  can  also  establish  error  estimates  similar  to  (3.13), 
(3.16)  for  the  gradient  of  the  error  and  maximum  norm  error  estimates. 
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ABSTRACT.  Several  new  methods  for  the  numerical  solu¬ 
tion  of  convection  dominated,  convection-diffusion  equations 
are  presented.  These  methods  are  high  accuracy  methods  and, 
in  some  cases,  monotone  schemes.  Further,  they  can  be  imple¬ 
mented  in  a  way  as  to  require  only  an  asymptotically  negligi¬ 
ble  increase  in  storage  over  usual  first  order  methods.  Thus 
they  are  promisinq  candidates  for  vectorization .  Numerical 
experiments  are  presented  and  some  error  estimates,  proven 
by  the  authors  for  these  schemes,  are  reviewed. 

1.  INTRODUCTION.  In  this  paper  we  consider  the  problem 
of  solving  convection-diffusion  equations,  with  dominant  con¬ 
vection  terms,  by  a  high  accuracy  numerical  method.  There 
are  two  problems  which  provide  convenient:  models  for  these 
effects,  which  we  now  consider. 

2 

1.1.  The  Model  Problems.  Let  ft  c  ir  be  a  domain, 

------  V 

then  we  seek  u(x,y),  for  (x,y)  e  ft,  satisfying 


-eAu  +  v1(x,y)u}{  +  V2<x,y)Uy  +  g(x,y)u  =  q(x,y). 


u  =  0  on  9ft. 


(x,y)  6  ft, 

(1.1) 


In  the  above,  u  typically  denotes  a  concentration  or  tempera¬ 
ture  that  is  convected  through  ft  by  the  velocity  field  v(x,y)  = 
(v1#v2),  e  >  0  is  a  measure  of  the  diffusion  effects  relative 

to  v,  q(x,y)  is  a  source  and  g(x,y)u  represents  a  loss  term. 

In  many  typical  applications  e  <<  |v|  and  we  will  particu¬ 
larly  be  interested  in  cases  when  (the  cell  Peclet  number) 
h|v|/2e  <<  0(1),  where  h  is  a  typical  meshwidth  used  by  a 
numerical  scheme  for  (1.1).  Boundary  conditions  other  than 
Dirichlet  are  also  possible.  We  will  focus  on  Dirichlet  con¬ 
ditions  for  our  exposition  here. 


* 

On  leave  from  the  School  of  Mathematics,  Georgia  Tech.  The 
work  reported  herein  was  partially  completed  while  the  second 
author  was  visiting  the  Math.  Department  of  Carnegie  Mellon 
University. 
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We  also  consider  the  1-D  version  of  (1.1),  given  by 


-eu"  +  f(x)u'  +  g(x)u  =  q(x),  0  <  x  <  1,  u  =  u(x), 


u  (0)  =  a,  u(l)  =  6. 


(1.2) 


However,  we  consider  (1.1)  to  be  the  basic  model  problem. 
Methods  that  can  only  be  applied  to  the  1-D  case  are  not 
interesting  for  the  problems  we  have  in  mind. 

The  algorithm  we  consider  is  a  modification  of  the 
defect-correction  method  originally  proposed  by  Hemker  [29] , 
[30].  It  is  extremely  promising  as  a  high  accuracy,  minimum 
storage  algorithm  for  convection  diffusion  problems.  Before 
we  introduce  the  method  in  Section  2  we  now  consider  a  few 
typical  applied  problems  to  which  it  could  be  applied.  We 
will  then  outline  the  criteria  which  we  used  to  judge  the 
potential  success  or  failure  of  a  numerical  method  applied  to 
these  problems. 

1.2.  Two  Representative  Applied  Problems.  It  is 
important  to  note  that  convection-diffusion  equations  fre¬ 
quently  arise  in  more  complicated  contexts:  in  systems  of 
equations,  nonlinear  problems,  complicated  geometries,  with 
reaction  terms  present,  etc.  To  illustrate  this  consider 
the  following  two  examples: 

Example  1:  The  Navier-Stokes  Equations.  The  stream- 
function-vorticity  formulation  of  steady  incompressible  plane 
flow  gives  the  following  system  of  partial  differential 
equations  in  a  region  0  c  ir2 

-vAw  +  +  v2<jt>y  =  0 

(1.3) 

w  =  A\p,  v1  *  ipy,  v2  ~  ”^x' 

which  consists  of  a  convection-diffusion  problem  coupled 
with  a  Poisson  equation.  The  velocity  field  v  =  (vlfv2)  i-s 
to  be  determined  as  part  of  the  problem. 

Example  2:  Convection-Diffusion-Reaction  Equations. 
Under  the  constant  density  hypothesis,  the  steady  combustion 
of  a  laminar,  premixed  flame  is  governed  by  a  system  of 
convection-diffusion-reaction  equations  of  the  form: 

-  -(Ley-  AUj  +  ^  f..  (u)  *  q..  (x,u)  ,  j  =  1,...,J. 
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The  general  combustion  equations  for  a  premixed  gas 
consist  of  a  nonlinear  convection-diffusion-reaction  system 
for  conservation  of  energy  and  the  various  chemical  species 
in  the  exothermic  reaction.  These  are  coupled  to  the  Navier- 
Stokes  equations  (Example  1)  for  the  gas  in  which  the  reaction 
takes  place.  Under  the  constant  density  hypothesis,  we  can 
uncouple  the  two  syst-ems.  For  example,  with  constant  velocity, 
in  1-D,  with  the  constant  density  hypothesis,  for  a  simple 
A  ■*  B  reaction  for  a  steady  premixed  laminar  flame,  we  have 
the  system  [32]  : 


pv 


-iL  y 

dX  XA 


1 

LeA 


Va 


I 


pv 


dT 

dx 


d2T 


dx 


+  qakaV 


»  PYa  exp(-NA/T), 


where  p  is  the  density,  v  the  gas  velocity,  YA  the  mass  frac¬ 
tion  of  the  reactant  A,  LeA  the  Lewis  niimber  of  A,  T  the 
temperature,  QA  the  heat  released,  the  activation  energy 
and  Q.  a  preexponential  factor.  These  equations  have  been 

•i  •» 

nondimensionalized. 

Even  in  the  simplest  type  of  reaction  this  problem 
already  gives  various  numerical  schemes  great  difficulties. 
This  is  especially  true  when  the  velocity,  v,  is  not  known 
exactly  and  may  contain  spurious  oscillations. 

1.3.  Goals  of  Numerical  Methods  for  the  Model  Problems. 
For  a  numerical  method  to  show  promise  on  the  types  of  applied 
problems  that  occur  in,  e.g.,  the  previous  two  examples,  it 
must  have  several  characteristics  when  used  on  the  model 
problems  (1.1),  (1.2).  These  goals  include  (but  are  not 
restricted  to)  the  following: 

(1)  It  must  be  extendable  to  systems,  nonlinear  problems, 
etc. ,  without  severely  degrading  its  performance. 

(2)  The  stability  properties  must  be  independent  of  the  mesh 
geometry,  size  and  orientation,  with  respect  to  e  and  v. 

(3)  The  method  should  have  high  accuracy  at  least  in  smooth 
regions.  This  is  especially  important  for  problems  in 
high  dimensions  as  high  accuracy  reduces  the  number  of 
points  necessary  to  achieve  a  certain  percent  error 
and  thus  reduces  the  total  storage  needed. 
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(4) 


It  should  cjive  oscillation  free  solutions  without  exces¬ 
sive  artificial  viscosity.  In  combustion  problems , 
oscillations  in  the  velocity  field  can  trigger  premature 
detonation  while  excessive  artificial  viscosity  can 
lead  to  flame  squelching. 

(5)  Storage  requirements  must  be  minimized  if  the  method  is 
to  be  useful  for  3-D  problems. 

(6)  The  method  should  be  easy  to  implement  using,  as  far  as 
possible,  readily  available  software. 

In  the  next  section  we  will  introduce  a  method  that 
we  feel  satisfies  the  above  six  criteria,  namely  the  modified 
defect-correction  method.  In  Section  3  we  give  some  sample 
numerical  results  for  the  method.  We  also  compare  it  experi¬ 
mentally  to  a  variant  of  the  streamline  diffusion  method. 

We  summarize  in  Section  4  the  theoretical  results  that  have 
been  proved  by  the  authors. 

1.4.  Other  Approaches.  Exponentially  fitted  methods 
have  been  studied  intensively  for  the  problem  (1.2)  in  1-D, 
see  Allen  and  Southwell  (1].  For  extensions  of  this  work 
see,  e.g. ,  Kellogg  and  Tsan  (17],  de  Groen  and  Hemker  [6], 
El-Mistikawy  and  Werle  [9]  and  Berger,  Solomon  and  Ciment  [3]. 
These  exponentially  weighted  schemes  are  frequently  the  best 
method  for  (1.2)  but  can  prove  to  be  costly,  in  particular 
for  nonlinear  problems  in  2  and  3-D. 

s 

Much  less  work  has  been  done  on  numerical  methods  for 
the  2-D  problem.  Kellogg  [16]  has  shown  convergence  of  the 
2-D  Allen  and  Southwell  scheme  under  assumptions  on  the  vector 
field  v.  Raithby  [24]  has  considered  a  skew  upwind  scheme 
and  Hemker  [29]  considers  a  scheme  based  upon  a  convex  com¬ 
bination  of  the  usual  finite  difference  and  finite  element 
approximations.  Streamline  upwinding  has  been  considered  by 
a  number  of  people.  Brooks  and  Hughs  [5]  ,  Kelly,  Nakazawa  and 
Zienkiewicz  [18],  and  Navert  [23],  who  analyzed  a  finite 
element  streamline  diffusion  scheme.  With  the  exception  of 
the  2-D  Allen  and  Southwell  scheme,  these  others  do  not,  in 
general,  converge  uniformly  in  c  in  2-D,  Roos  [25] ,  and  most 
do  not  even  possess  a  discrete  maximum  principle!  A  satis¬ 
factory,  high  accuracy  method  for  the  2-D  problems  remains  to 
be  found. 


2.  ITERATED  DEFECT  CORRECTION. 

2.1.  Basic  Method.  We  now  present  the  method  in  its 
simplest  context.  Consider  (1.1),  let  ft  be,  e.g.,  the  unit 
square,  h  =  1/N  <  1,  =  jh,  y^  =  jh  (j  =  0,1,...,N), 

-ij  =  -^xi'yi^'  uij  =  u<xi'yi>'  etc*  Define  the  operator 
Dx'Dx'Dx'  by 


1176 


Dxuij  -  h'1("i+1j  -uij>'  Vi,  H  h'1(uij 

Dx  *  (2h)'1‘ui+lj-ui-lj>- 

The  operators  Dy,Dy,Dy'  are  similarly  defined.  Let  us  further 
introduce  the  notation: 


Ahu . .  -  (D+D-  +  D+D~)u..,  vhu. . 
13  xx  y  y  ij  ij 


(D°u. . ,D°u. . )u.  . , 
x  ij  y  ij  13 ' 


,h  _  , 


SI  =  interior  nodes  =  { ( x^ ,  y  ^ ) :  1  <  i  <  N,  l<j< 


N} 


rh  =  boundary  nodes  =  {  ( 0 ,  y  ^ )  :  0  s  j  SN}  u  ( (l,y^) :  0  s  j  sN)  u 

{(x^,0):  0  si  sn}  u  {(x^,l):  0  si  sN). 

The  first  step  is  to  calculate  the  usual  artificial  viscosity 
solution  given  by 


Lh  U^.  =  -enAhU*.  +  v.  .  *VhU*.  +  g.  .uf 
£q  13  0  13  -n  n  ’ 


•ID 


ij 


’13  13 


°i,  -  (xi'V  ‘  rh' 


qij'  c  n  » 

(2.1) 


where 


en  ■  h  max  { | v. . • (1, 0) | , | v. . • (0 , 1) | }/2 
u  Isi, jsN  13 

Next  an  updated  approximation  is  calculated  via 


(2.2) 


13  ID 


qij  "  LEuij’  ‘’W  1  °h 


(2.3) 


Lc0Eij  -  Ri,'  <xi'V  «  * 

Ei,  =  0,  (xi>yj)  £  rh 


(2.4) 


ui ,  -  ui,  +  Ei, 


(2.5) 
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where 


Lhuf.  =  -eAhU?\  +  v.  .  *VhU?\  +  g.  .  U*  . 
e  id  ID  “ij  ID  ID  *D 

We  stop  after  two  or  three  iterations,  otherwise,  Hemker  [30], 
if  •+•  U°°  then  L^U°°  =  q^j  and  U*  is  the  highly  oscillatory, 
unsatisfactory  central  difference  approximation. 

We  will  further  refine  the  basic  method  in  §2.3. 


Notes 


(1)  The  above  method  extends  readily  to  non-uniform 
meshes  and  more  complex  geometries. 

(2)  Condition  (2.2)  ensures  that  the  discrete  coef¬ 
ficient  matrix  is  a  diagonally  semi-dominant  M-matrix  [26]  , 
provided  g  £  0. 


(3)  There  are  a  number  of  other  possible  and  promising 
correctors  to  use  in  place  of  L  .  For  example,  we  may  try 


the  streamline  diffusion  corrector,  L 
If 

,2 

=  v  o grad, 

3v 


_a_ 

3v 


e,S' 


defined  as  follows: 


_9_ 

3v 


A 

3v  ' 


we  define 


L  .w  =  -eAw  -  6 
e ,  6 


.2 
3  w 


3v 


+  v  o  Vw  +  gw, 


where  0  <;  6  =  0(h).  When  v  is  oriented  with,  or  at  45°  to, 

the  mesh,  L*1  t  (the  usual  discretization  of  L  *)  works  well, 
E  ,  0  -  E  ,  0 

see  Sections  3  and  4.  Alternately,  we  may  use  the  upwind 
approximation : 


Lh  Qw. .  =  -eA^w. .  +  v. (2h) 
a, 3  ID  ID  1 


-1 


I(-lta)wi_^j-2cxwij+(l+a)wi+1>j] 


+  v2(2h)"1[  (l+3)wij+1-23wi;.  +  (-l+B)wi:|_1]  +  gwi;. , 
where  a,  3  are  chosen  to  ensure  that  L*1  a  is  an  M-matrix.  We 

(X  9  P 

mention,  in  particular,  the  choice  of  "optimal  upwind  para¬ 
meters"  (p  =  h/e) 
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a  -  -  2/pv^  +  coth(pv^/2), 
3  =  -  2/pv2  +  coth(pv2/2) . 


Another  possibility  is  to  use  different  amounts  of  viscosity 

in  different  directions.  We  replace  L*1  by  the  central 

.  e  0 

difference  approximation  to  L  w  =  -e,w  -  e_w  +  v,w  + 

e  ,c  1  xx  2  yy  lx 

v2Wy  +  gw,  where  1 


hv. 


1 

i  max{e,  , 


e2  &  max{e , 


hv2 

IT 


The  use  of  Ln  is  much  cruder  than  these  choices  but 
e  o 

it  has  the  advantage  of  being  independent  of  the  velocity 
field  v. 

Computational  Complexity.  The  discrete  system  of  equa 
tions  (2.1)  may  equivalently  be  expressed  in  the  form 


Ar  U1  =  q  (2.7) 

Eo"  3 

2  2 

where  (i)  A  is  an  N  *N  banded  matrix  with  main  diagonal,  and 

£  o  .  -1 

four  off  diagonals  nonzero,  and  halfbandwidth  equal  to  N  =  h 

T 

(ii)  q  =  (...q . )  is  an  N  *  1  vector.  Likewise  the  updated 

~  3-D 

solution  (2.5)  is  given  by 

A  Un+1  =  q  +  (A_  -A_)Un  ,  n  *  1,2,...  (2.8) 

e0  ~  0  e 

The  "best"  method  of  solution  for  (2.7)  and  (2.8)  is  depen¬ 
dent  upon  the  size  of  the  linear  system  and  the  facilities 
available  to  the  researcher  (e.g. ,  scalar  or  vector  computer). 
Solutions  of  (2.7)  and  (2.8)  via  (a)  direct  methods  and 
(b)  iterative  methods  (e.g.,  conjugate  residuals,  conjugate 
gradient,  S.O.R.  etc.)  are  briefly  discussed  below. 

(a)  Direct  Method  of  Solution  of  (2,7)  and  (2.8). 

Using  a  direct  method  the  system  of  equations  (2.7)  is 
factored  initially  as  A  =  LU.  Each  iterate  past  the  first 

.  e0 

requires  only  a  residual  calculation,  a  forwardsolve  and  a 
backsolve.  Thus,  the  computational  complexity  involved  in 
2  3 

computing  U  ,U  ,  etc.,  is  negligible  w.r.t.  the  amount 
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required  to  compute  u\  the  usual  artificial  viscosity 
approximation. 


Since  we  only  use  A  for  a  residual  calculation  we  may 

2  e 

store  it  using  only  0(N  )  locations,  which  is  also  negligible 

additional  storage  compared  to  the  O(N^)  locations  needed  to 
computer  L-U  decomposition  of  A 

e0 

(b)  Iterative  Method  of  Solution  of  (2.7)  and  (2.8 )_ . 
Iterative  methods  can  be  very  efficient  for  the  solution  of 

(2.7)  and  (2.8)  because  we  take  Un  as  an  initial  guess  in 

calculating  Un  .  Typically,  only  a  few  iterations  are 

needed  to  calculate  Un+^  since  we  begin  with  a  good  initial 
guess.  The  coefficient  matrix,  A  ,  is  a  diagonally  dominant 

e0 

M-matrix  and  thus  such  methods  as  conjugate  residuals  and 
(overrelaxed  multilevel)  Jacobi  will  converge  nicely.  More¬ 
over  the  storage  requirements  are  reduced  significantly  as  both 
A  and  A  can  be  efficiently  stored  as  five  N  *  1  vectors 
E0  e 
respectively. 

2.2.  4th  Order  Corrector.  Hemker  [29 j,  [30],  reports, 

based  on  "local  mode"  analysis,  we  can  replace  Lh  in  (2.3) 

th  ~h  ^ 

by  a  4  order  corrector  L£ ,  such  as  the  nine  point  cross: 


~h 

L  U.  . 
e  ID 


-eA*?U.  .  +  v.  .  -V^U.  .  +  g .  . U .  .  ;  1  <i,j  <  N-l, 
^  4  i]  -13  4  13  *13  13 ' 


-eAhU . .  +  v. . • Vnu. .  +  g. .U . . ,  i  or  j  =  1  or  N-l , 
13  -13  13  *i3  13  J 


At. . 

4  13 


2.-1 


<12h  1  +  16Ui+lj  -  48Uij  +  -  Ui-2j 


-U. ...  +  16U . . . «  +  16U. .  .  -  U. .  ,) 
13+2  13+1  13-1  13-2 


=  (4h)“1(-u,^n4  +  -  4U.  ,  .  +  U, 


4  ij 


i+2  j 


tW.  -  .  •  U  •  ~  .  f 

1+I3  1-I3  ‘  1-23 


-Uij+2  +  4Uij+l  -  4Uij-l  +  Uij-2>' 


and  expect  ||u-U*||  *  O(h^). 

This  is  untested  in  practice  but  appears  especially 
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promising  as  the  implementation  of  this  higher  order  corrector 
can  be  achieved  with  very  little  additional  work  or  storaye. 
Specifically,  the  associated  corrector  matrix  A£  has  nine  non¬ 
zeros  diagonals  (which  may^be  efficiently  stored  as  nine  N  *  1 
vectors  since  we  only  use  A^  for  calculating  residuals)  and 

the  computation  of  the  RHS  of  (2.8)  only  requires  an  additional 
4N  scalar  multiplications  and  additions. 

2.3.  Filter  Step.  We  have  shown  that  (2.1) -(2. 5)  gives 
a  good  approximation  away  from  the  layers  (even  on  a  coarse 
mesh).  Experimentally,  we  observe  that  each  iteration: 

(i)  Outside  of  the  layer  the  approximation  improves 
(ii)  The  numerical  boundary  layer  spreads. 

To  avoid  the  spread  of  the  boundary  layer  and  yet  still 
iteratively  improve  the  approximation  we  modify  (2.1) -(2.5) 

to  include  a  filter  step.  Specifically,  replace  .  in  (2.4) 

1  1 J 

by  F ( j ) •  The  filter  step  can  also  be  inserted  in  (2.5) 

and  gives  the  Modified  Defect  Correction  Method,  Hemker  [30]. 

We  are  currently  investigating  several  different  filters  and 
comparing  performance. 

The  order  of  the  filter  should  be  consistent  with  the 
accuracy  of  the  method.  For  example,  if  we  use  a  fourth  order 
corrector  we  apply  a  fourth  order  filter.  Specifically,  for 

4 

smooth  w(x,y) ,  w  -  F (w)  *  0(h  ).  In  other  words,  if  F(w^j)  = 

then  for  2  - 

ine,  imB,  . 

f (0)  =  Z  x  e  Xe  2  =  1  +  0(|e|4) 

£,m.  ' 


as  1 0 1  -*•  0.  , 

3.  SAMPLE  NUMERICAL  RESULTS.  We  now  give  some  sample 
numerical  results  for  the  defect  correction  algorithm  and  com¬ 
pare  them  with  a  variant,  studied  by  Ervin  and  Layton  [12], 
of  the  streamline  diffusion  method  of  Brooks  and  Hughs  [5]. 

To  summarize  these  findings:  for  simple  model  problems  with 
constant  coefficients  or  simple  layer  structure,  etc.  ,  both 
methods  yield  good  answers.  The  numerical  experiments  in  these 
cases  confirm  the  theoretical  predictions  as  to  their  rates 
of  convergence  (see  Section  4) .  On  more  complex  problems 
with  multiple  and  interior  layers,  attractive  type  turning 
points,  etc.,  the  defect  correction  approach  proved  to  be  more 
robust  and  gave  excellent  approximations  —  both  qualita¬ 
tively  and  quantitatively.  On  these  complex  problems  the 
modified  streamline  diffusion  approach  was  sensitive  to  the 
precise  choice  of  the  diffusion  parameter,  6.  In  some  cases 
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oscillations  appeared  near  turning  points  —  possibly  due  to 
a  non-optimal  choice  of  6. 

The  modified  defect  correction  method  extends  directly 
and  easily  to  2-D  with  excellent  results.  Extension  of  the 
streamline  diffusion  method  to  2-D  as  a  monotone  type  scheme 
is  still  an  open  problem  and  under  investigations. 

Before  we  give  the  numerical  results  we  will  introduce 
the  "adjusted"  streamline  diffusion  method  and  discuss  some 
of  the  filter  steps  that  have  actually  been  used  in  (2.4). 

3.1.  Adjusted  Streamline  Diffusion  Method.  Define,  as 
above,  3/8v  =  v  «  grad,  jf2/3v2  =  3/3v(3/3vj^  The  continuous 
~  "  g 

operator  is  defined  via  applying  the  operator  (I  -  6  -g-— )  to 

(4.1)  and  omitting  the  0(e6)  (third  order)  term.  This  results 
in  the  B.V.P. 


~  a  9 

Le  6W  E  "eAw  ~  5  — 2  w  +  3V^w-iS9w)  +  gw 
'  3  v  — 

=  q  -  6^  q  in  n, 


(3.1) 


w  =  0  on  3ft. 


Note  that  L  x  has  the  same  form  as  L  ,  except  for  an  added 
e ,  o  c 

streamline  diffusion  term.  Here  6  ■  0(h)  is  picked  to  attempt 
to  ensure  that  the  central  difference  approximation  to  (3.1) , 

£,  is  of  monotone  type  (see  Layton  and  Morley  [22])  once 
the  boundary  unknowns  are  eliminated  from  the  linear  system. 

2  3  2 

Define  the  usual  0(h  )  approximations  to  7^^  : 


Dxy  ■  I(DxDJ  +  DxDy> '  Dxy  "  l(DxDy  +  DxDJ> ' 
Dxy  -  9Dxy  +  <1-6>D;y  «0S  6il>' 


where 


Dtu (x#y)  =  h  1(u(x+h,y)  -  u(x,y)), 


2 

etc.  One  0(h  )  discretization  of  (3.1)  proceeds  as  follows. 
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is  given  by 


0  ~ 

The  principle  part,  L  of  L 

c  ,  o  i 

Le,6w  =  -(c+4vl>wxx  '  2 4vlv2wxy  -  (c+{v2)uyy 
and  is  discretized  bv: 

(Lc,«)h  =  -<^)D2  -  2«Vlv2D«y  -  (c+«v2)DJ  . 

The  remaining,  lower  order  terms,  in  (3.1)  are  approximated 
by 

(i)  upwind  differences  for  terms  premultiplied  by  6, 

(ii)  central  differences  for  the  remaining  terms. 

*v  K 

This  yields  the  discrete  operator  L  ,.  6  c  [0,1]  and 

E  ,  0 

6  ■  0(h)  can  be  chosen  point  by  point.  The  "optimal"  choice 
for  stability  of  these  two  parameters  is  not  clear  at  this 
moment,  see  Section  4  for  more  details. 

3.2.  Various  Filter  Steps.  We  have  found  that  the 
method  (2.1)— (2.5)  is  fairly  robust  as  to' the  specific  filter  step 
used.  For  example,  in  the  results  which  we  are  now  reporting, 

n  h  n 

we  used  a  clipping-filter  on  the  residual  vector  R  *  g-LeU  : 


Calculate  r  and  rQ  (mean  and  standard  deviation) , 

If  |rj-r|  >  r0,  set  r^  :■  0,  (3.2) 

Otherwise  Tj  :*  r^. 

Hemker  [30]  proceeded  by  filtering  Un+*  by  applying  one  step 

of  Jacobi  iteration  to  Un+^  using  the  artificial  viscosity 
matrix.  Experiments  are  currently  underway  with  other  aver¬ 
aging  operators,  Tchebyscheff  filters,  etc. 

3.3.  Numerical  Experiments.  We  now  give  some  2-D  and 
1-D  examples  for  the  methods  discussed  above. 

Example  1. 


Leu  =  -eAu  +  ux  a  g  in  I)  »  (0,1)  x  (0,1) 
u  »  0  on 


(3.3) 
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g  =  g(x,y,e)  was  chosen  so  that  the 
by 

_l-x 

u(x,y)  =  sin (itx) sin  (Tiy)  {e  G 


Note  that  u(x,y)  has  characteristic 
0(/e)  along  the  boundaries  y  =  0  and  y  =  1  and  an  outflow 
boundary  layer  of  order  e  along  x  =  1,  Echaus  [7].  We 

tested  the  methods  with  h  =  Ax  =  Ay  =  N  N=8,  15  and 
-2 

e  =  10  ,  so  that  the  mesh  was  extremely  coarse  with  respect 

to  the  0(e)  outflow  boundary  layer. 

With  L*1  as  the  correction  operator,  we  found  that, 
eo 

away  from  the  layers,  the  defect  correction  method  gave  per- 
2 

feet  0(h  )  convergence.  In  the  figures  that  follow  we  give, 
respecti*  ely,  the  bilinear  interpolant  of  the  true  solution 

and  the  approximate  solution  and  error  plot  at  the  4tl1 
iterate  for  h  =  1/7  and  1/14.  It  is  interesting  to  note  that 
the  corners  where  the  characteristic  and  outflow  layers  over¬ 
lap  appear  to  be  the  most  difficult  areas  to  approximate  u. 
The  global  errors  and  decay  exponents  ate  given  in  Table  3.1. 

Table  3.1.  Global  errors  in  defect-correction  approximation 
to  (3.3).  Correction  operator  is  taken  to  be 
artificial  viscosity  approximation  to  L  and  no 
filter  waj  used. 


true  solution  was  given 

i-y  -y 

♦  A 


boundary  layers  of  width 


Iterate 

Max-Norm  Error 

Decay 

Exponent 

l2-Error 

Decay 

Exponent 

N  =  8 

N  =  15 

oo 

ii 

z 

N-15 

1 

m i 

mm 

.2 

RES 

mm 

.  17 

2 

Kill 

mtSm 

.  37 

WSMM 

Wmm 

.35 

3 

.079 

.056 

.53 

.030 

K21 

.58 

4 

.075 

.053 

.56 

.027 

HI 

.68 

It  is  remarkable  that  the  method  seems  to  be  converging 
(slowly)  even  in  the  layers  and  in  the  corners  mentioned 
above.  This  is  not  even  predicted  by  the  estimates  in  1-D. 

We  next  tried  to  improve  these  results  by  using  the 
streamline  diffusion  operator  as  the  corrector,  L*}  A,  with 

E  9  0 

6  *  0(h).  The  improvement  was  dramatic  and  is  summarized  in 
Table  3.2. 
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Table  3.2.  Approximation  solution  of  (3.3)  using  streamline 
diffusion  corrector,  no  filter  is  used. 


Iterate 

Max-Norm  Error 

Decay 

Exponent 

l--Error 

Decay 

Exponent 

N  =  8 

N  =  15 

N  =  8 

N=15 

1 

.059 

.033 

.  92 

.025 

.014 

.9 

2 

.039 

.016 

1.44 

.013 

.0044 

1.7 

3 

.  0^3 

.012 

1.6 

.0082 

.0025 

1.9 

4 

.030 

.010 

1.7 

.0065 

.0020 

1.9 

The  example  (3.3)  is  the  best  possible  for  the  streamline 
diffusion  correction:  no  grid  orientation  effects  are  present 
in  the  numerical  model  of  (3.3)  since  the  velocity  field 
v  =  (1,0)  is  oriented  with  the  mesh. 

Many  more  experiments  must  be  performed  in  2-D  to  vali¬ 
date  the  method  and  to  test  various  correctors. 

The  next  example  illustrates  the  advantage  of  applying 
a  simple  filter  when  using  the  defect-correction  method. 

Example  2. 


-eu"  +  u*  -  u  =  0 
u  (0)  =  1,  u(l)  =  1 


(3.4) 


The  filter  used  for  this  and  the  following  1-D  examples  was 
the  clipping  filter  given  in  (3.2). 

With  e  =  10-2  and  h  =  1/10  and  1/20,  the  error  and  the 
decay  exponents  are  given  in  Tables  3.3  and  3.4  for  approxi¬ 
mations  computed  with  and  without  using  the  filter,  respec¬ 
tively. 

Moreover  observe  that  after  three  iterations  the  approxi¬ 
mate  solution  obtained  simply  by  iterating  is  beginning  to 
oscillate  about  the  true  solution.  The  filter  step  has 
virtually  no  effects  on  the  approximate  solution  away  from 
the  layer.  It  actually  provides  a  very  good  answer  up  to 
the  edge  of  the  boundary  layer. 

The  following  two  examples  illustrate  the  effectiveness 
of  the  defect-correction  method  (with  filter)  on  problems 
involving  complicated,  multiple  boundary  layers,  turning 
points  and  nonsmooth  coefficients.  The  equations  (3. 5) -(3. 6) 
were  taken  from  Pearson  (31J  so  that  an  asymptotic  solution 
was  available  for  comparison.  A  change  of  independent  variable 
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Table  3.3.  Error  and  decay  exponents  for  (3.4)  after  3 
iterations  without  using  clipping  filter. 


Table  3 


xn 

Error 
h  -  0.1 

Error 
h  *  0.05 

Decay 

Exponent 

.1 

.  200E-3 

. 483E-4 

2.05 

.2 

.444E-3 

. 107E-3 

2.05 

.3 

.  738E-3 

. 177E-3 

2.06 

.4 

. 109E-2 

.  262E-3 

2.06 

.5 

. 151E-2 

. 362E-3 

2.06 

.6 

. 201E-2 

. 4  81E-3 

2.06 

.7 

- . 161E+0 

. 620E-3 

— 

.8 

.  635E+0 

. 785E-3 

— 

.9 

- . 121E+1 

. 281E+0 

— 

4. 

Error  and  decay 
iterations  using 

exponents  for*  (3.4)  after 
the  simple  clipping  type 

Error 

Error 

Decay 

xn 

h  =  0.1 

h  *  0.05 

Exponent 

.1 

. 200E-3 

. 483E-4 

2.05 

.2 

•444E-3 

. 107E-3 

2.05 

.3 

. 738E-3 

. 177E-3 

2.06 

.4 

. 109E-2 

. 262E-3 

2.06 

.5 

. 151E-2 

. 362E-3 

2.06 

.6 

. 201E-2 

. 4  81E-3 

2.06 

.7 

. 859E-4 

.  620E-3 

— 

.8 

. 762E-2 

. 785E-3 

3.28 

.9 

- . 241E-2 

. 153E-2 

0.66 
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was  made  to  pose  the  problems  on  0  £  x  £  1  rather  than 
-1  £  x  s  1.  The  boundary  conditions  for  each  equation  was 


u(0)  =  1,  u(l)  =  2. 


Example  3. 

EU"  +  |x-0 .5 |u'  +  (2x-l . 5) 3u  *  0  (3.5) 


Example  4  . 

eu"  +  (2x2-2x+0 . 25) u 1  +  (2x-l)u  =  0.  (3.6) 


For  the^  examples,  e  =  0.25E-4  and  h  =  0.01. 

In  both  cases  the  method  yielded  an  approximate  solution 
matching  closely  the  asymptotic  solutions  found  by  Pearson 
[12]  with  the  same  qualitative  behavior.  Boundary  layers 
occurring  at  the  endpoints  of  the  domain -were  contained  to 
one  mesh  interval  and  internal  boundary  layers  influenced 
no  more  than  three  mesh  points.  The  approximate  solutions 
to  (3.5)  and  (3.6)  are  illustrated  in  the  Figures  6  and  7. 

Numerical  approximations  to  the  solutions  of  Examples  3 
and  4  were  also  computed  using  the  streamline  diffusion 
method  described  in  §3.1.  This  method  also  gave  very  good 
approximations  but  was  more  sensitive  to  the  choice  of  h 
and  6.  Specifically  for  Example  3  with  h  =  0.01  the  numeri¬ 
cal  approximation  contained  a  spurous  oscillation,  see  Figure 
9,  however  for  h  =  0.25E-2  the  approximation's  behavior  agreed 
with  that  of  the  asymptotic  solution,  Figure  10. 

4 .  ERROR  ESTIMATES  FOR  THE  METHODS .  In  this  section 
we  summarize  the  error  estimates  that  have  been  proven  for 
the  modified  defect  correction  method  and  the  adjusted 
streamline  diffusion  method.  We  consider  the  2-D  problem 
(1,1)  and  the  1-D  problem  (1.2).  Here  we  assume  that  the 
coefficients  of  (1.1),  (1.2)  are  smobth,  g(x)  *  0,  the 
domain  0  is  "meshlined"  and  e  is  small  w.r.t.  acceptable 
(outer)  meshwidths. 

4.1.  The  Adjusted  Streamline  Diffusion  Method.  We 
begin  by  considering  the  method  defined  in  Section  3.1. 

Special  questions  are  associated  with  this  method  in  2-D. 

To  isolate  these  issues,  we  focus  our  attention  briefly  on 

^  0 

the  principle  part  of  L  L  .  and  its  approximation. 

E  f  0  £  a  0 
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Since  the  principle  part  of  L  ►  contains  a  cross-derivative 

t  fO 

term,  we  examine  its  approximation  carefully.  Recall  that 


,6 U  =  <-£+Svl)uxx  -  26  viv2uxy  - 
)h“ij  =  -U«vJ)D*u  -  26Vlv2D®yU 


(e+6v?)u  , 

2'  yy' 

-  (e+Sv^)  DyU. 


Here  D2*  =  D+D  ,  =  D*D  and  (L?  t ) ^  is  defined  on  a  uniform 

x  xx  y  yy  u 

mesh  on  a  meshlined  domain  w  ,  see  Section  3.1  for  more 
details . 


Theorem  4.1.  {see  Ervin  and  Layton  [12]]. 

(i)  For  £  >  0,  6  2  0,  ^  is  an  elliptic  operator. 

(ii)  For  0  <  e  <<  1,  6  a  0  and  general  velocity  fields 

y  =  (v^,V2),  (L^  g)*1  is  not  a  positive  type  difference 
operator. 

(iii)  There  does  not  exist  a  consistent ,  positive  type 

approximation  to  L®  ^  under  the  assumptions  of  (ii)  .  □ 

Nevertheless  (i/?  -)*1  does  contain  interesting  mathema- 

C  f  0 

tical  structures  —  the  interior  discrete  maximum  principle 
holds  for  the  operator: 

Theorem  4.2.  [see  Layton  and  Morley  [22;  Thm.  1]]. 

Assume  £>0,620.  (L  .)  is  a  (inverse)  monotone  operator 

£ ,  o 

on  the  interior  nodes  when  the  boundary  conditions  are 
homogeneous.  Thus,  the  interior  discrete  maximum  principle 
holds.  □ 

We  are  currently  working  on  extending  Theorem  4.2  to 
the  case  of  lower  order  terms,  including  a  precise  "prescrip¬ 
tion"  for  the  proper  choice  of  6 . 

In  1-D  the  situation  is  much  clearer.  Define,  for 
xfi  =  h,2h,  . . . ,  (N-l)h,  (h  =  1/N)  . 


Lh  ,U  = 

£+6  n 


-<e+4nf2(xn,)Dxu„  +  f‘x„>'1-{„f,<x„l-s„9<V>DxUn 


+  <g<xn)-«nf<x„>9’<xn>>Un  - 

U0  =  a'  UN  = 


(4.1) 
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where  D  U 
x  n 

conditions : 


6  is  chosen  to  satisfy  the 


n 


h|f(xn) 

2|e+*  f2(x„)| 


sl,  6  =  0(h) 


(4.2) 


Theorem  4.3.  [Ervin  and  Layton  [12]].  The  discrete 
maximum  principle  holds  for  L£+^  provided  (4.2)  holds  and 


'n 


-  6  f  g' 
n  n^n 


a  0. 


(4.3) 

□ 


We  then  have  the  global  error  estimate: 

Theorem  4.4.  [Ervin  and  Layton  [12]].  Assume  (4.2), 
(4.3)  hold.  Then  the  error  in  (4.1)  satisfies 


max  | u ( x  )  — U  |  s  Ch(e+6+h)  max  |u'"  (x)  | .  (4.4) 

Osx  si  n  Osxsl 

n 

If  e  is  small  w.r.t  h  and  f(x)  *  a  >  0  We  have 


max  |u(x  )-U  |  s  Ch.  (4.5) 

Osx  si  n  n 
n 

In  the  above,  C  is  independent  of  e.  □ 


(4.5)  shows  that  the  method  converges  linearly  uniformly 
on  [0,1]  and  (4.4)  implies  that  when  u  is  smooth  in  e 

2 

that  the  convergence  is  essentially  quadratic  0(h  +eh) .  The 
next  result  shows  that  we  obtain  this  high  rate  of  conver¬ 
gence  outside  of  the  layers  even  when  u  is  singular  in  e. 

We  assume  that  f(x)  >  a  >  0  so  that  there  is  an  0(e)  outflow- 

type  layer  at  x  =  1.  In  this  u^(l)  =  0(e  ■*)  as  e  **•  0. 


Theorem  4.5.  [Ervin  and  Layton  [12]].  In  addition  to 
the  assumptions  of  Theorem  4.4,  suppose  £(x)  >  a  >  0,  and 

f  *xn*”6nf  *xn>gn*xn*  "  6nf  *xn*  f '  *xn*  *  a*  Then'  for 

0  js  x„<  1: 
n 


|u(x  )-u_|  *  Ch2{l+h”2  exp[-b  —  — ] }  +  Ceh{l+e“2  exp[-a  -  — ] } , 

X*  It  W  M  V 


-2 


1-x 


where  c  >  0  is  independent  of  e,  h,  and  u,  b  ■  min{a,  In  3], 
Cq  =  max(e+6nf2).  □ 
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We  next  consider  the  problem  of  estimating  derivatives 
of  solutions  of  singularly  perturbed  B.V.P.'s.  Thi’s~Is  the 
important  problem  as  stress  intensity  factors,  skin  friction 
coefficients,  etc.  require  knowledge  of  these  terms. 

Theorem  4.6.  [Ervin  and  Layton  [13]].  Under  the  assump- 
tions  of  Theorem  4.5  we  have: 

r 

U  ^,-U  .  0  ,  1-x  “• 

! u'  (xn}  ~  *  Ch2{l+h~3exp[-b  -—■]) 

1-x 

+  Ceh{l+e  exp  [-a  — ^-^] } .  □ 

Thus,  away  from  the  layer  we  can  get  good  approximations 
via  the  usual  methods.  At  the  layer  at  x  =  1  (in  our  case) 
we  require  an  exponentially  fitted  difference  quotient. 

Under  the  assumptions  of  Theorem  4.6  define 

-i  3  -  U 

DUn  =  e  x( - !LT__)f(i),  Xn  =  l-o(h). 

l-exp(-f(l)  -£-£] 

We  then  have  that  the  relative  error  is  0(h)  (as  u'(l)  - 
0(e-1) )  . 

Theorem  4.7.  [Ervin  and  Layton  [13]].  Under  the 
assumptions  of  Theorem  4.6.  Suppose  x  is  chosen  so  that 

2  ** 

1-x  =  0(h)  and  |u(x  ) -U  I  =  0(eh+h  )  uniformly  in  e.  Then 

n  n  n 

U'  (l)-DU 

| - ^-^1  s  C(h+e)  .  □ 

c 

The  techniques  used  to  establish  these  theorems  involve 
a  potpouri  of  arguments  due  to  Gershgorin  [15] ,  barrier  func¬ 
tion  arguments  following  Kellogg  and  Tsan  [17] ,  maximum 
principle  arguments  used  to  validate  asymptotic  expansions 
for  B.V.P.'s,  see  Eckhaus  [7],  and  the  theory  of  monotone 
matrices  developed  by  Bramble  and  Hubbard  [4]  and  Varga  [26] . 

We  note  that  the  "adjusted  streamline  diffusion  method" 
we  study  is  a  finite  difference  interpretation  of  the  finite 
element  streamline  diffusion  method,  proposed  and  studied 
by  Wahlbin  [27] ,  [28]  for  scalar  hyperbolic  equations, 
examined  for  hyperbolic  systems  by  Layton  [19],  [20],  [21], 
and  Du,  Gunzburger  and  Layton  [8] .  The  finite  element  imple¬ 
mentation  of  this  circle  of  ideas  was  analyzed  for  (1.1)  by 
Navert  [23]  and  applied  to  fluid  flow  problems  by  Brooks  and 
Hughs  [5]. 
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4.2.  The  Defect-Correction  Method.  Global  error  esti¬ 
mate  s’ToF_the~iTiodrFred~deFect— correction  method  can  be 
established  in  2-D  and  in  1-D  in  a  similar  manner  to  Theorem 
4.4.  These  error  estimates  show  that#  when  the  true  solution 
is  smooth  uniformly  in  e,  the  error  after  k  iterations  in 
the  basic  method  (artificial  viscosity  corrector,  second  order 

2  k 

defect  operator,  no  filter  step)  is  0(h  +(Eq-e)  ) .  The  local 

error  estimates  in  1-D  are  more  interesting  as  they  give  an 
idea  of  the  spread  of  the  numerical  boundary  layer  from  one 
iteration  to  the;  next. 

V 

Theorem  4.8.  [Ervin  and  Layton  [14]].  Let  =  u(x_) 
t  H  n  n 

be  the  k  defect  correction  approximation  to  the  solution 
of  (1.1).  Suppose  that  f(x)  *  a  >  0,  g(x)  *  0  and  that 
F  =  I  (the  filter  step  is  omitted).  Then,  for  n  =  0,1,..., N 


1— x 

u(xn)“Unl  5  Ch2[l+e“2exp(-a 


)] 


1— x 


+  C(e0-e)k[l+eeQkexp(-a  — 


n 


)] 


-4 


1— x 


+  Ch’U+EQ-expf-b 


for  n  —  1/2  §  m  • • § N-l § 

-Uk  1-x 

lu'  (xn>~  *  Ch2[l+e'3exp(-a  -^1  ] 


1— x_ 


+  C(e0-E)k[l+ee’k"1exp(-a  -^)  ] 

A  C  1“X_ 

+  Ch*  [l+En3exp(-b  — -)]. 


0 

1-x. 


lr  Jr  Jr 

— 2tj  +u“ 

|u«(xn)  -  n+1  2n  .n~A|  s  Ch2 [l+EQ4exp (-a  ] 


1-x 


+  C(e0-e)k[l+eEQk"2exp(-a  -^)  ] 

A  ft  1-Xn 

+  Ch*[H-eftbexp(-b  ———-•)  ] 


where  b  =  min{a,  In  3). 

The  modified  defect  correction  iteration  can  also  be 


□ 
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implemented  via  a  finite  element  type  procedure.  This  case 

was  studied  in  Axelsson  and  Layton  [2]  where  global  error 

.  .  2  1 
estimates  in  L  (ft)  and  H  (ft)  were  proven. 
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A  PLANE  PREMIXED  FLAME  PROBLEM  WITH  TWO-STEP  KINETICS: 


EXISTENCE  AND  STABILITY  QUESTIONS 
Cl.  Sdmldt-Lainfi 

CNRS  -  D§partement  M.I.S.  Ecole  Centrale  de  Lyon 
69131  ECUILY  Cedex  -  France 


Abstract  :  We  consider  a  nonlinear  differential  system  modelling  a  two-step 
reaction  in  a  plane  premixed  flame.  The  unknowns  are  two  functions  u  and  v 
(temperature  and  mass  fraction)  and  a  parameter  6  associated  with  the  burning 
rate. 

For  the  existence  question,  we  introduce  a  normalized  problem  which  is  first 
studied  on  a  bounded  Interval.  Upper  and  lower  solutions  induce  a  priori  esti¬ 
mates  which  enable  us  to  pass  to  the  limit  of  a  doubly  infinite  interval.  We 
obtain  the  existence  of  a  solution,  and  we  provide  an  explicit  value  for  6 
which  is  related  to  the  L 2 -norm  of  w  =  u-v. 


Of  special  interest  in  the  behaviour  of  the  system  in  a  neighbourhood  of  the 
space  variable  bound  +  «.  An  autonomous  and  2"°  order  homogeneous  system  ap¬ 
proximates  it  here,  for  which  the  boundary  condition  0  e  TR*  appears  to  be  a 
degenerate  fixed  point.  The  problem  is  embedded  in  the  more  general  framework 
of  the  stability  of  the  equilibrium  point  0  e  3Rn  for  a  second  order  homoge¬ 
neous  systsn  of  dimension  n.  The  homogeneity  property  allows  to  reduce  the 
dimension  of  the  system  by  means  of  a  change  of  the  unknowns  and  variable. 

The  stationary  points  of  the  reduced  system  are  usually  hyperbolic,  and  their 
asyirptotic  analysis  can  be  lift  back  to  get  a  stability  theorem.  These  results 
cure  illustrated  by  the  analysis  of  the  combustion  problem.  For  a  special  va¬ 
lue  of  a  physical  real  parameter,  a  bifurcation  phenomenon  occurs. 


I.  Physical  framework 

In  a  recent  paper  [4],  we  introduced  a  two-step  irreversible  reaction  for  a 
steady  plane  flame,  with  chain-branching  /chain-breaking  kinetics  : 

(1)  A+X+2X,  2X  +  M+2P  +  H 

Radioed  X  is  obtained  in  the  production  step,  which  has  a  very  large  acti¬ 
vation  energy  8,  and  provides  product  P  in  the  recombination  step  for  which 
the  activation  energy  is  taken  to  be  zero  ;  A  is  the  reactant  and  M  a  third 
body. 


This  two-step  share  is  presented  as  an  alternative  to  classical  single-step 
kinetics  and  allows  the  description  of  a  wider  range  of  phenomena. 

The  equations  are  derived  in  the  stretched  flams  zone,  described  by  the  one- 
dimensional  space  variable  r»  »  -  “<n<  +  *°»by  assuming  a  fast  recombina¬ 
tion,  i.e.  that  both  production  and  recombination  of  radicals  take  place  in 
the  same  thin  zone.  The  system  reads  (see  [4  p.  423]) . 
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(2)  i  u"  “  qi  ^  6<uwv)v  e"U  +  6(u-v)J 
(  v"  -  r  5  (u-v)v  e~^ 

and  the  associated  boundary  conditions  are  : 

u  -  -  n  +  0(1)  ,  v  =*  -  n  +  o(i)  as  ri  +  -  « 
u  =  o(l)  ,  v  «  o(l)  as  n  •*•  +  “ 

The  unknowns  are  positive  functions  u  and  v,  representing  tenperature  and 
mass  fraction  of  the  reactant  and  a  positive  constant  6  ,  representing  the 
burning  rate.  The  parameters  qi  and  qa  are  the  proportions  of  the  total  heat 
released  in  the  first  and  second  steps  of  the  reaction,  so  that  qi  +  q2  =  l. 
Physical  considerations  require  the  recombination  step  to  be  exothermic,  so 
that  q2  >  0  . 

Finally  r  is  a  positive  parameter,  corresponding  to  the  ratio  of  the  two 
reaction  rates.  The  boundary  conditions  (3)  are  obtained  by  matching  with 
expansions  on  either  side  of  the  flame  sheet. 

The  mathematical  problem  is  the  following  t  qi  ,  q2  and  r  >  0  being  given, 
find  two  functions  u  >  0  ,  v  >  0  and  the  constant  6  >  0  satisfying  (2) 

(3)  .  Vie  refer  to  [4]  for  a  numerical  treatment  of  this  problem,  leading 
to  curves  in  the  (r,6)  -  plane. 

It  is  particularly  convenient  for  our  study  to  deal  with  an  equivalent  formu¬ 
lation  of  the  system  involving  the  radical  mass  fraction  w  *  u  -  v  >  0. 

It  consists  of  both  systems  : 

v"  *  r  6  v  we  e 
w"--q,rivwei,«‘w+«w1 

and 

u"  *»  qi  r  6(u-w)w  .e-1*  +  5w2 
w"  «  -  q2  r  6(u-w)v  e-11  +  6  wl 

together  with  the  boundary  conditions  t 

v  «  -  n  +  o(D  ,  w  ■  oil)  as  n  +  -  " 
v-o(l)  ,  w  -  o(l)  as  n  ♦  +  • 

and 

(7)  (  u  =  -  n  +  o(D  ,  w  “  o (l)  as  n  -  00 

(  u  «  o(l)  ,  w«o(l)  as  n  +  +  * 

II.  The  existence  question  [1],  [2],  [5] 

Let  us  introduce  the  following  problem  in  the  x-variable,  which  is  obtained 
from  (4)  by  taking  5  equal  to  1  : 
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(8) 


d2v  -v  -w 

a?r=rvwe  e 


d2w 

dx5 


— V  -W  9 

-qrvwe  e  +  w 


We  consider  this  nonlinear  system  subject  to  the  following  boundary  conditions 


v'  -*•  0  as  x  +  °° 
w  -*•  0  as  x  -*■  +  <*> 

together  with  the  normalization  condition 

(10)  v(0)  =  1. 


We  prove  the  following  theorem  : 

Theorem  1  :  For  any  r  >  0  fixed,  there  exists  a  solution  (v,w)  6  (C° (3R) ) 2 
to  problan  (8) ,  (9) ,  (10)  such  that 

(11)  0  <  w  <  Mq  *  r/e 

(12)  v>0;v'<0;v->  +  «  as  x  v  +  0  as  x  +  +  «  . 

Moreover,  w  e  H2(IR)  and,  as  x  ■*  v(x)  =  -  Mx  -  x  )  +  E.S.T.,  where 
xQ  e  3R  and 

<13)  ‘■iflw’(x,dx-4.  'M'frw  •  ■ 


To  return  to  the  initial  problem  (4)  (6) ,  vie  make  the  transformation 
(14)  n 
Then  (8)  becomes 


jl(x  -  xQ) 


(15) 


d2v 

dn7 

d2w 

dnr 


1  -v  -w 

— 2  r  v  w  e  e 

1  "V  -w 

-n  q  rvwe  e 
&  2 


*hv‘ 


and,  as  n  +  v(n)  *  -  n  +  E.S.T.  Thus  it  is  clear  that  (4)  (6)  is  solved 
by  the  pair  (v(n),  w(n))  with 


(16)  fi  =  l/i1 


Therefore,  we  have  determined 
r+» 

(17)  6  ■  q2  /  w2(n)  dn  . 

In  [2],  we  give  a  detailed  demonstration  of  Theorem  1.  A  sketch  of  the  proof 
is  the  following  :  First,  we  exhibit  positive  upper  and  lower  solutions  of 
the  problem  in  a  formal  way.  Next,  we  consider  the  system  (8)  on  a  bounded 
interval  [  -  a,  b]  with  suitable  boundary  conditions  and  we  prove  the  exis¬ 
tence  of  a  solution  (v,w)  in  a  closed  convex  set  X,  by  a  fixed  point  argument 
(the  convex  K  involves  the  upper  and  lower  solutions) .  Vie  derive  a  priori  es¬ 
timates  in  the  C1  -  norm  as  well  as  in  the  H1  -  norm.  Finally,  we  let  a  and  b 


tend  to  +  «  ,  and  prove  the  existence  of  a  limit  which  solves  (8)  (9)  (10) . 
Properties  (12) ,  (13)  appear  as  a  "spin  off"  of  the  existence  proof.  An 
alternative  proof  by  a  topological  shooting  method  has  been  presented  by 
S.P.  Hastings,  C.  Lu  and  Y.H.  Wan  [3]. 


III.  The  stability  question  [5],  [6] 

System  (5)  can  be  rewritten  in  the  canonical  form  : 

Iu*  =  p 

p'  =  qi  r  5(u-w)we~u+  '6w2 
w'  »  q 

q'  *  -  q2  r<$  (u-w)we~U  +5W2 


for  which  0  6  IR4  appears  to  be  a  fixed  point,  derived  from  the  boundary 
conditions  (7) .  By  linearizing  (18)  about  this  point,  it  comes  4  zero  eigen¬ 
values  ;  such  a  degenerate  fixed  point  requires  more  sophisticated  treatment. 
Let  us  then  consider  the  second  order  approximation  of  (18)  near  this  point  : 


u'  =  p 

p'  =  q  r  5  (u-w)  w  +  5  w2  .. 
(19>  <W-q‘ 

q'  =  -  q2  r  6(u-w)w  +  5  w2 


This  system  only  contains  quadratic  terms  and  can  be  considered  in  the  gene¬ 
ral  form  of  second  order  homogeneous  problem. 


So,  ve  consider  the  problem 

(20)  «  X"  =  F(X) 

an 

where  X  e  3Rn  and  F  is  a  second  order  homogeneous  function. -More  precisely, 
we  study  the  stability  of  the  degenerate  fixed  point  0  e  3R  n  of  the  autono¬ 
mous  dynamical  system  : 

(2D  |X'-Y 
(  Y’  -  F(X) 


The  homogeneity  property  allows  to  reduce  the  dimension  of  problem  (21) .  By 
means  of  the  change  of  functions  : 

<22)  X  =  T$IT  !  y  "  TWP/2 


such  that  (X,Y)  (x.y) 

e  IRn  x  3RH  e  Sn-1  x  TRn 


where  s”-1  is  the  unit  sphere  of  3Rn  ,  together  with  the  change  of  variable 
defined  by 

ts=  i  w  i1* 


(23) 
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the  reduced  autonomous  dynamical  system  is  the  following  : 


at  -  y  -  x<x-y> 

<24>  i 

g  =  F(x)  -  |  y(x.y) 

where  (  .  )  denotes  the  euclidian  scalar  product  In  3Rn  ,  and  j  | .  1 1  the 
associated  norm. 


Then  we  prove  existence  of  at  least  a  couple  of  syntnetric  fixed  points  of 

(24)  s  P.  «=  (x,,y, )  e  S11"1  x  3Rn  and  exhibit  the  mapping  between  trajectories 
of  (21)  and  trajectories  of  (24)  ;  an  essential  remark  is  that  a  trajectory 
of  (24)  provides  by  lifting  a  family  of  trajectories  of  (21) ,  due  to  an  ar¬ 
bitrary  integration  parameter  ;  the  stability  result  is  the  application  of 
the  former  property  to  invariant  manifolds  : 


Theorem  2  :  Let  P  *  (x  ,y  )  6  S*1”1  x  3Rn  ,  n  >  1  ,  be  a  fixed  point  of 

(24) ,  sucITthat  (x  .  y°)  <  07  Then 
o  o 

(i)  The  stable  manifold  w  (P  )  of  problem  (24)  lifts  to  the  stable  ma¬ 
nifold  W^O)  of  homogeneous  problem  (21) . 

(ii)  dim  Ws(0)  -  dim  W^PJ  +  1  .  m 

o  / 


The  main  interest  of  this  result  is  that  investigation  of  the  stable  mani¬ 
fold  VIs  (P  )  of  problem  (24)  is  often  accessible  by  classical  tool,  namely 
by  linearization,  because  the  fixed  points  of  (24)  are  usually  hyperbolic. 

A  complete  stability  analysis  of  fixed  points  P0  of  (24)  is  presented  in  [6], 
leading  to  results  on  the  dimension  of  the  stable  manifold  VJ5(0)  of  (21) . 
These  results  are  summarized  as  follows  : 


Theorem  3  :  Let  us  suppose  that  F'  (jc,)  is  similar  to  a  diagonal  matrix 
and  that  its  eigenvalues  Xj  are  such  that  A  j  ■  a^  +  ibj  : 


6b?  *  25  p  (p-a.*) 

(25)  3  (  )2J 


o  -o 


Then  by  lifting  of  W®(P0) 

(26)  dim  W3 (0)  -  J 

je  J 

where 


to  W®(0) 
ra(Aj) 


J  *»  {  j  e  IN  /  6b?  >  25  p  (p-a^) } 

and  m(Aj)  is  the  geometrical  multiplicity  of  the  eigenvalue  Aj  .  a 

Applying  this  general  method  to  (19)  leads  to  a  reduced  system  for  which 
two  fixed  points  Pg  and  Px  are  obtained,  verifying  0^.yo)  <  0,  (xx.yi)  <  0. 

As  r  ■  1,  the  two  fixed  points  are  no  more  distinct  and  this  unique  fixed 
point  is  degenerate.  For  r  jt  1,  we  obtain  by  linearization  and  lifting  to  (19)  : 

(27)  dim  W®(0)  =2 

The  transcritical  bifurcation  case  r  ■  1  is  solved  by  means  of  the  Center 
Manifold  Theorem.  1907 
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Bistable  response  is  common  in  many  situations;  one  is  the 
equlibrium  temperature  in  a  catalyst  particle  [5],  as  illustrated 
schematically  in  figure  1.  The  surface  temperature  of  the  pellet 
is  the  control  parameter  which  chooses  between  the  high  and  low 
temperature  branches.  Numerical  calculations  [4,5]  and  formal 
asymptotic  [2]  studies  of  a  simplified  model  have  shown  that 
oscillating  this  control  parameter  at  a  sufficiently  high 
frequency  permits  the  pellet  temperature  to  remain  on  the  lower 
branch,  even  in  the  face  of  perturbations  that  otherwise  would 
cause  an  undesireable  jump  to  the  higher  branch. 

We  derive  similar  conclusions  from  a  rigorous  differential 
inequality  analysis.  This  analysis  also  reveals  that  periodic 
oscillations  are  not  necessary;  any  sort  .of  oscillation  suffices, 
provided  its  time  integral  is  sufficiently  small.  These  results 
are  briefly  sketched  here.  Complete  details  and  their  extension 
to  a  general  class  of  problems  will  appear  in  [1]. 
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Figure  Is  Equilibrium  temperature  v  in  a  catalyst  particle 
vs.  pellet  surface  temperature  *.  The  solid  lines 

O  0 

are  stable,  the  dashed  unstable.  The  point  (v  ,  *  ) 

is  a  stable  low  temperature  operating  point,  and 

*  * 

(v  ,  *  )  is  the  maximum  low  temperature  operating 
point. 


A  simplified  model  of  the  catalyst  particle  is 
v* ( t)  ■  *  -  g(v) , 

where  v  is  the  spatially  uniform  temperature  of  the  pellet's 
interior  and  A  is  its  surface  temperature;  see  [2,5].  The 
graph  of  g  is  sketched  in  figure  2.  Its  form  reflects  the 
multiplicity  of  states  possible  from  the  nonlinear  interaction  of 
pellet  temperature  and  reactant  kinetics. 
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Figure  2:  The  nonlinear  term  g(v)  in  (1)  vs.  v. 

The  operating  point  (v°,x°)  shown  in  figure  1  is  a  low 
temperature  state;  (v*,X*)  is  the  maximum  low  temperature 
operating  point.  The  primary  result  of  our  analysis  is  that 
substantial  perturbations  of  the  lower  equilibrium  temperature 
v°  can  be  bounded  away  from  the  corresponding  higher  equilibrium 
value  vq  shown  in  figure  1  if. the  boundary  temperature  X  is 
oscillated  about  x°. 

Specifically,  define  A(t)  ■  x°  +  0$(t)  for  some  arbitrary 
function  +.  Let  v(t)  denote  the  solution  of  the  perturbed 
system 

(1)  v' (t)  ■  A ( t )  -  g(v) , 

(2)  v(0 )  -  v°  +  0V, 
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where  8,  V  are  positive  constants.  Let  ♦ ( t )  *  Jfc  ♦(s)  ds. 

Then  we  show  in  [1]  that  the  solution  v(t)  of  (1-2)  satisfies 

(3)  v ( t )  £  v*  +  B*(t)  +  0V 
provided 

(4)  6 ( | M |  +  V)  <  [2(X°-X*)/g"]1/2. 

Here,  |  |  |  |  =  sup  (  |  ♦  ( t )  |  s  0  £  t  <  “1  and  g"  is  evaluated  at 
its  maximum  on  [v,v*].  The  proof  is  a  simple  differential 
inequality  argument. 

The  bound  (3)  involves  the  maximum  temperature  on  the  low 
temperature  branch,  the  time  integral  of  the  oscillatory  part  oL 
the  boundary  temperature  term  A(t),  and  the  perturbation  0V. 

The  condition  (4)  requires  that  ||f||  be  small.  If  the 
oscillations  in  the  surface  temperature  are  sinusoidal,  e.g.,  if 
♦(t)  =  sinut,  then  | | * ||  ~  1/w  and  (4)  requires  that  the 
frequency  be  large. 

This  suggestion  that  there  is  a  critical  lowest  frequency 
which  stabilizes  the  low  temperature  branch  is  consistent  with 
the  numerical  and  asymptotic  evidence  [5,2].  Indeed,  for 
0  <  8  <<  1,  Cohen  and  Matkowsky  [2]  found  an  expression  for  this 
critical  frequency  that  is  similar  in  form  to  (4). 

However,  (4)  can  certainly  be  satisfied  when  ♦  is  other 
than  sinusoidal;  4  need  not  even  be  periodic.  Any  sort  of 
oscillatory  strategy  will  suffice  so  long  as  (4)  holds. 

These  ideas  are  extended  in  [1]  to  a  general  class  of 
problems  of  the  form 


V  -  f(v,X), 

where  f  is  only  required  to  exhibit  a  local  equilibrium  that 
loses  stability  for  X  sufficiently  large.  Complete  proofs  and 
applications  to  other  physical  problems,  such  as  stabilizing 
phase  transitions  [6,7],  appear  in  [1]  as  well. 
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A.K.  Kapila  and  G.  Ledder 
Den^-tment  of  Mathematical  Sciences 
Rensselaer  Polytechnic  Institute 
Troy,  New  York  12180-3590 


ABSTRACT.  The  unsteady  response  of  a  plane,  adiabatic  flame  to  a 
temporal  gradient  in  the  enthalpy  of  the  reacting  medium  Is  studied.  The 
characteristic  time  of  the  gradient  is  taken  to  be  of  the  same  order  as 
the  natural  time  scale  of  the  flame.  The  mathematical  formulation  leads 
to  a  moving  boundary  problem  which  must  be  treated  numerically.  The 
results  show  how  the  burning  rate  responds  to  variations  in  Lewis  number, 
amplitude  of  the  gradient,  and  characteristic  time  of  the  gradient. 

I.  INTRODUCTION.  When  a  plane  flame  propagates  through  a  com¬ 
bust  iFITlie^TuiinilTo^  state  Is  uniform.  It  does  so  at  a  constant  speed. 

In  many  practical  applications,  however,  the  state  of  the  fresh  mixture 
exhibits  spatial  and/or  temporal  nonuniformities.  These  nonuniformities 
may  occur,  for  example,  in  temperature,  reactant  concentration,  or  both. 
The  flame  will  then  propagate  In  an  unsteady  fashion. 

The  flame  response  depends  crucially  upon  how  the  natural  scales  of 
the  flame  (l.e.  the  diffusion  length  and  time  scales)  compare  with  the 
characteristic  scales  of  the  nonuniformity.  If  the  scales  of  the  nonuni¬ 
formity  are  relatively  long,  the  framework  of  Slowly-Varying  Flames  (SVFs) 
applies  (see  [1],  Chap.  3).  That  problem  Is  the  subject  of  recent  work  by 
Bissett  and  Reuss  [2],  who  have  undertaken  an  analysis  In  the  limit  of 
large  activation  energy  (e*-).  They  assumed  that  enthalpy  variations  in 
the  fresh  mixture  have  a  characteristic  length  e  times  larger  than  the 
flame  thickness,  and  amplitude  0(1/ e)  relative  to  that  of  the 
undisturbed  state;  It  Is  well-known  that  0( 1/ e)  fluctuations  In  the  flame 
temperature  can  lead  to  0(1)  fluctuations  In  the  burning  rate.  Thel.* 
analysis  leads  to  an  ordinary  differential  equation  for  the  time  variation 
of  the  burning  rate.  A  study  of  this  equation  for  Lewis  numbers  less  than 
unity  (the  stable  regime  for  planar  flames  subject  to  planar  pertur¬ 
bations)  reveals  that  the  flame  exhibits  a  delayed  response  to  the 
enthalpy  fluctuations,  but  that  this  sluggishness  disappears  as  the  Lewis 
number  approahces  unity.  Bissett  and  Reuss  also  considered  the  effect  of 
heat  loss,  since  they  were  particularly  concerned  with  the  unsteady  beha¬ 
vior  of  the  flame  near  extinction. 

The  lively  response  of  the  flame  for  near-unity  Lewis  numbers  Is  also 
confirmed  by  the  analysis  of  Mlkolaltls  [3],  who  treats  0(1)  variations 
In  upstream  enthalpy,  and  hence  exponentially  large  fluctuations  In 
burning  rate.  Only  positive  enthalpy  gradients  are  considered,  and  the 
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characteristic  scale  of  the  nonuni f oriel ty  Is  taken  to  be  the  same  as  the 
Initial  flame  thickness.  However,  the  problem  quickly  falls  Into  the  SVF 
mold,  since  the  flame  thickness  shrinks  exponentially  as  the  flame  encoun¬ 
ters  an  0(1)  increase  In  upstream  enthalpy.  Mlkolaltls  finds  that  the 
flame  adjusts  'Instantaneously*  to  the  local  state  ahead  of  the  preheat 
zone. 


In  the  SVF  limit,  the  flame  retains  Its  quasisteady  structure,  and 
that  Is  what  makes  the  problem  analytically  tractable.  This  Is  no  longer 
the  case  when  one  considers  0(1/ 6)  perturbations  with  a  characteristic 
scale  comparable  to  the  scale  of  the  flame.  Now  the  burning  rate 
variation  Is  only  0(1),  so  that  Mlkolaltls'  analysis  does  not  apply.  In 
fact,  0(1)  time  variations  Intrude  Into  the  preheat  zone,  and  the  flame 
becomes  genuinely  unsteady.  Further  analytical  progress  Is  not  possible 
and  one  must  resort  to  numerics. 

This  paper  Is  concerned  with  purely  temporal  nonuniformities  In  the 
state  of  the  medium,  occurlng  on  a  time  scale  comparable  to  the  time  the 
flame  takes  In  travelling  a  distance  equal  to  Its  thickness.  Spatial 
stratification  Is  not  considered  here;  It  requires  a  different  treatment. 
In  contrast  to  the  SVF  framework  where  spatial  and  temporal  fluctuations 
are  equivalent.  The  aim  here  Is  to  determlre  the  burning-rate  response  of 
the  flame  as  a  function  of  Lewis  number,  fluctuation  amplitude,  and  fluc¬ 
tuation  time  scale.  The  mathematical  problem.  It  turns  out.  Involves  a 
moving  boundary,  but  It  can  be  solved  by  standard  numerical  techniques. 

2.  GOVERNING  EQUATIONS.  It  Is  convenient  to  adopt  the 
Near -Fq  u  1 d  1  f fusion  a 1  Flame  (NEF)  formulation  (see  [1],  Chap.  3)  which  can 
be  derived  from  the  full  combustion  equations  with  Arrhenius  kinetics 
under  the  assumptions  of  large  activation  energy  (e+»),  near-unity  Lewis 


number  and  nearly-unlform  enthalpy,  l.e., 

L'1  *  1  -  el/a,  H=T  +  oY*l  +  o+ch, 

e  ■  (1  +  a)  2/0  +  0.  (1) 

It  Is  also  convenient  to  employ  a  density-weighted  spatial  coordinate  tra¬ 
velling  with  the  flame.  Then,  to  leading  order  In  e,  the  governing 
equations  are 

Tt  +  H  Tx  1  Txx  for  x<0,  T  ■  1  +  a  for  x  >  0,  (2a) 

hfc  +  M  hx  »  hxx  +  t  Txx  +  S(t)  for  x  <  0,  (2b) 

S(t)  -  hf(t),  (2c) 

with  boundary  conditions 

T  1,  h  +hf(t)  as  x  +  -•,  hx  ♦  0  as  x  ♦  «,  (3) 

jump  conditions 


1216 


(4) 


ST  s  6h  *  0,  6(hx  +  iTx)  *  0, 

5T  *  -a  exp  (h/2)  at  x  *  0, 

and  initial  conditions 

T  =  l+  aex,  ti  »  -  i  x  ex  for  x  <  0, 

T  *  1  +  a,  h  3  0  for  x  >  0.  (5) 


In  equations  (4)  above  <sF  is  defined  as  follows: 


5F  *  F(0+,t )  -  F(0-,t). 


Several  remarks  about  the  governing  equations  are  in  order. 

(1)  The  symbols  T,  H,  M,  L  and  a  denote,  respectively,  the  tem¬ 
perature,  enthaply,  burning  rate,  Lewis  number  (ratio  of  thermal  dif- 
fusivity  to  mass  dlffusivlty)  and  the  heat-release  parameter.  The  symbols 
h  and  *,  defined  by  equations  (1)  above,  denote  a  reduced  enthalpy  and  a 
reduced  Lewis  number  respectively,  and  represent  small  departures, 
measured  on  the  e-scale,  from  constant  values. 

( 1  i )  The  zero-fluctuation  state  of  the  fresh  mixture,  also  the  Ini¬ 
tial  state,  Is  chosen  as  the  reference  for  nondlmensionallzatlon,  and  is 
given  by 

T  *  1,  h  ■  0,  M  *  1. 


As  already  mentioned,  the  spatial  coordinate  is  density-weighted,  and  the 
dimensionless  thermal  diffusion  coefficient  Is  allowed  to  vary  according 
to  the  prescription 

X/Cp  -  T, 

where  a  is  the  thermal  conductivity  and  Cp  the  specific  heat.  For  the 

sake  of  brevity  details  of  the  nondlmenslonalizatlon  process  are  omitted 
here,  but  are  quite  standard  and  can  be  found*  for  example.  In  [4]. 

(ill)  It  Is  assumed  that  enthalpy  of  the  medium  undergoes  purely 
temporal  fluctuations  with  0(e)  amplitude,  l.e.,  far  ahead  of  the  flame, 

H  »  1  +  a  +  e  h^(t). 
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The  source  term  S(t)  in  equation  (2b),  defined  by  (2c),  is  Included  to 
ensure  that  equation  (2b)  balances  at  x  «  -  «.  The  enthalpy  nonunifor¬ 
mity  may  be  due  to  fluctuations  In  temperature  T,  reactant  concentration 
Y,  or  both.  Also,  it  Is  worth  noting  that  since  the  amplitude  of  the 
nonuniformity  Is  0(e),  the  upstream  boundary  condition  for  temperature  In 
equation  (3)  is  unperturbed. 

In  the  computations  presented  below,  the  enthalpy  variation  is 
assumed  to  be 

hf(t)  *  h Jl-exp(t/tf)]2,  t  >  0, 


i.e.  the  reduced  enthalpy  of  the  fresh  mixture  varies  smoothly  and  monoto 
nlcally  from  the  value 


hf (0)  =  0 


to  the  value 


hf(.)  -  h„  . 

The  time  constant  tf  determines  the  rate  of  variation. 

It  is  convenient  to  define  a  modlfed  enthalpy  variable  according  to  the 
prescription 


tp  *  h  - 

which  leaves  (2a)  unchanged  and  allows  the  remainder  of  the  system  (2)  - 
(4)  to  be  rewritten  as 

*t  +  M  *x  a  *xx  +  1  Txx  for  X  *  °»  (6a) 

T  ♦  1,  tj>  ♦  0  as  x  tpx  ♦  0  as  x  ♦  »,  (6b) 

6T  =  6<p  =  6(i|»x  +  iTx)  -  0,  5TX  «  -  a  exp  [(ip  +  hf)/2]  at  x  =  0.  (6c) 
The  Initial  conditions  (5)  remain  unchanged,  with  <p  replacing  h. 

3.  NUMERICS.  The  goal  of  this  work  Is  to  study  the  behavior  of  the 
burning  rate  M(T)  as  a  function  of  the  parameters  t,  h^  and  tf.  The 

above  equations  define  a  moving  boundary  problem  which  was  treated  numeri¬ 
cally  as  follows.  The  doubly  Infinite  problem  was  discretized  on  the 
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finite  interval  [-A,B]»  A,  B  >  0,  according  to  the  Crank-Nicholson 
scheme.  At  each  time  step  a  provisional  value  of  M  was  assumed  and  Ue 
governing  differential  equations  for  T  and  integrated  by  using  all 
the  jump  conditions  in  (6c)  except  the  last.  The  remaining  condition  was 
then  incorporated  into  a  Muller  root  finder  to  iterate  to  the  correct 
value  of  M. 

For  each  choice  of  parameter  set  (t,  h^),  some  experimentation 

was  found  to  be  necessary  to  determine  the  numerical  boundary  locations  A 
and  B,  but  the  largest  range  ever  used  corresponded  to  A  +  B  =  18. 

Almost  all  the  calculations  employed  at  *  0.20  and  ax  *  0.04,  and 
needed  no  more  than  three  Muller  iterations  for  convergence.  The  accuracy 
of  the  numerical  scheme  was  tested  by  comparing  the  numerical  results  with 
asymptotic  analytical  results  obtained  in  the  limit  t  +  0,  and  separately, 
in  the  limit  hB  -*-0.  In  each  limiting  case  the  problem  linearizes 

and  can  be  solved  analytically  by  using  Laplace  transformation. 

,.4.  RESULTS.  The  numerical  results  are  displayed  in  Figures  1-3. 

Each  7Tgure  displays  the  effect  on  M  of  a  single  parameter  in  the  triad 
( i ,  hB,  tf),  while  the  other  two  are  kept  fixed.  All  runs  were  computed 

at  a  *  4. 


(i)  Effect  of  £ 

Fig.  1  reveals  the  effect  of  variation  of  £  upon  M(t).  In  this 
figure  tf  is  fixed  at  the  value  unity,  and  h,,,  at  £n  4,  so  that  M 

varies  from  1  to  2  as  time  Increases  from  0  to  »  .  The  dotted 
curve  corresponds  to  the  quasisteady,  or  Instantaneous  response.  For 
large  and  negative  values  of  i  the  actual  flame  response  lags  behind 
the  quasisteady  response  over  most  of  the  time  interval.  However,  the 
flame  becomes  livelier  as  £  Increases,  and  eventually,  for  i  suf¬ 
ficiently  large  and  positive,  the  burning  rate  overshoots  the  ultimate 
value  of  2  and  a  decaying  oscillation  appears.  (The  appearance  of  the 
oscillation  is  the  precursor  to  eventual  Instability  of  the  steady  flame 
in  favor  of  pulsatile  motion.) 

(ii)  Effect  of  h  . 

00 

Fig.  2  (a,b,c)  are  drawn  at  t*  *  1  and  i  *  0,  and  display  the 
variation  of  flame  response  with  h^. 

(iii)  Effect  of  tf. 

In  Fig.  3,  hB  and  £  are  set  at  respective  values  £n  4  and  zero, 
while  tf  Is  changed  from  1  to  1/4.  For  the  shorter  value  of  tf  the 
burning  rate  shows  an  overshoot,  Indicating  that  larger  enthalpy  gradients 
provoke  a  stronger  response. 
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A  2-Dimensional  Scalar  Chandrasekhar  Filter 


for  Image  Restoration 


A.  K.  Mahalanabis  and  Kefu  Xue 

Department  of  Electrical  Engineering 
The  Pennsylvania  State  University 
University  Park,  Pa  16802 


Abstract  —  Based  on  the  previous  work  on  the  2-dimensional  (2D)  strip  Chan¬ 
drasekhar  filter  (CF)  algorithm  by  Mahalanabis  and  Xue  [l],  a  more  efficient  and  accurate 
scalar  format  2D  CF  algorithm  is  described  in  this  paper.  The  filtering  algorithm  is  de¬ 
veloped  for  the  image  modeled  by  an  Non  Symmetric  Half  Plane  (NSHP)  model.  Unlike 
the  conventional  Kalman  filtering  (KF)  algorithm  which  uses  the  Riccati-  type  difference 
equations,  this  algorithm  is  based  on  the  Chandrasekhar-type  difference  equations  which 
gives  the  algorithm  better  numerical  properties  and  computational  efficiency.  The  com¬ 
putational  requirements  of  scalar  CF  algorithm,  scalar  KF  algorithm  and  the  suboptimal 
reduced  update  Kalman  filtering  (RUKF)  algorithm  developed  by  Woods  and  Radewan 
[2]- [3]  are  evaluated  and  compared.  The  comparison  shows  that  the  scalar  CF  algorithm 
costs  less  than  10%  of  the  computational  effort  that  the  scalar  KF  algorithm  needs  and 
less  than  30%  of  that  the  RUKF  algorithm  needs.  The  experiment  on  a  simulated  image 
demonstrates  the  great  noise  reduction  and  numerical  stability  of  the  algorithm. 
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i.  Introduction 


In  the  previous  work  [l],  we  have  developed  a  relatively  efficient  recursive  suboptimal 
filtering  algorithm  for  reducing  the  noise  in  the  image  data  which  is  so  called  strip  Chan¬ 
drasekhar  filtering  (CF)  algorithm.  Since  the  CF  algorithm  will  reduce  the  computational 
requirement  to  the  maximum  extent  when  the  observation  quantity  is  a  scalar.  The  aim  of 
this  paper  is  to  develop  a  2-dimensional  (2D)  CF  algorithm  which  processes  and  restores 
one  image  pixel  at  a  time.  This  is  so  called  scalar  2D  CF  algorithm.  The  2D  scalar  CF 
algorithm  not  only  cuts  down  more  than  90%  computational  effort  comparing  with  the 
conventional  KF  algorithm,  but  also  yields  the  optimal  filtering  result.  A  2D  version  of 
the  scalar  CF  algorithm  for  the  image  modeled  by  Non  Symmetric  Half  Plane  (NSHP) 
model  is  developed  and  analysed  in  the  following  sections. 

In  section  ii,  the  noise  reduction  filtering  problem  is  analytically  formulated.  An 
M  x  M  order  NSHP  model  is  considered  for  the  noise  free  data.  The  observed  image  data 
is  corrupted  with  zero  mean  white  noise.  The  global  state  space  model  proposed  by  Woods 
and  Radewan  [2]-[3|  is  adopted. 

Section  iii  is  devoted  to  the  derivation  of  the  2D  scalar  CF  algorithm  and  analysis  of 
its  computational  requirement.  In  section  iv,  the  computational  requirements  of  scalar  KF 
algorithm  and  the  suboptimal  reduced  update  Kalman  filter  (RUKF)  developed  by  Woods 
and  Radewan  is  evaluated.  The  computational  requirements  are  expressed  in  term  of  the 
order  of  the  NSHP  model  M  as  well  as  the  image  size  index  N.  These  algorithms  are  scalar 
processor,  therefore  the  computational  requirement  can  be  compared  with  respect  of  the 
number  of  operations  per  pixel  restoration.  Since  the  number  of  memory  assess  operations 
is  strongly  computer  structure  dependent,  the  investigation  of  this  requirement  is  out  of 
the  focus  of  this  paper.  Comparing  with  the  operation  time  of  multiplication  and  addition, 
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the  logic  operation  time  is  ignorable  and  also  the  algorithms  involve  in  very  limited  number 
of  logic  operations.  Therefore  the  computational  requirement  comparison  will  focus  only 
on  the  number  of  multiplications  and  additions  per  pixel  restoration,  section  v  contains 
the  results  of  simulation  studies  and  section  vi  serves  as  the  conclusion. 


11.  Problem  Formulation 

The  image  to  be  processed  consists  of  N  x  N  equally  spaced  gray  level  pixels.  The  noise 
free  image  is  expressed  by  an  array  {g(i,j)\  1  <  i,j  <  N}  where  i  and  j  are  vertical  and 
horizontal  pixel  location  indices  respectively.  The  observed  image  array  {*(*,/);  1  <  i,j  < 
N}  is  corrupted  with  additive  noise  array  {v(t,j)\  1  <  i,j  <  N)  which  is  a  white  zero  mean 
random  field  with  variance  ojj.  It  is  assumed  that  the  noise  free  image  {g{itj)\  1  <  i,j  <  N) 
can  be  represented  by  a  zero  mean  discrete  Markov  random  field  which  is  modeled  by  a 
autoregressive  type  NSHP  predictive  model.  Because  almost  all  images  have  only  limited 
correlation  distance,  this  assumption  is  reasonable.  For  an  M  x  M  order  NSHP  model, 
the  present  pixel  value  can  be  linearly  related  to  its  specified  neighboring  pixels. 


M 


n=l 


g{ij)  =  Y  a(°»n)y(*»i  -  n)+ 

=i 

M  M 

L  E  ac(m,n)g(i  -  m,j  -  n)  +  w(i,j), 


(1) 


m=l  n=-M 


where  1  <  i,j  <  N  and  a(m,n)’s  are  the  coefficients  of  NSHP  model.  {tu(t,y);l  <  i,j  < 
N}  is  a  white  zero  mean  random  field  with  variance 
The  observed  image  can  be  expressed  as  follows: 


*(»./)  =  »(*'»>)  +  v(i,j),  1  <  ij  <  N. 


(2) 
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Adopting  the  global  state  vector  developed  by  Woods  and  Radewan  [2]-[3],  for  the 
raster  scanned  image,  i.e.,  left  to  right,  advance  one  line,  then  repeat,  the  ( NM  +  M)  x  1 
state  vector  x(i,j)  is  defined  as  follows: 

*T(».y)  =  [y(*.i)iCr(*.i  -  l),...,y(*,l);y(«-  hN),g(i  -  ltN- 1),. ..,$(*  -  1, 1); 

...\g(i  -  M  +  1,  JV),0(t  -  M  +  1,N  -  1),.. .  ,g(i  -  M  +  1,1);  (3) 

g{i  -  M,  N),g(i  -  M,N  -  l),...,g(i  -  M,j  -  M  +  1)], 

where  N  is  the  image  size  index  and  M  is  the  order  of  the  NSHP  model  of  noise  free  image, 
the  dimension  of  state  vector  is  ( MN  +  M)x  1.  Note  that  the  elements  of  the  state  vector 
are  the  pixel  value  of  the  raster  scanned  noisefree  image  data. 

Based  on  the  definition  of  state  vector  x(i,j),  the  following  state  equations  are  derived 
from  the  NSHP  model. 

z(i,j  +  1)  =  Fx(i,j)  +  dw(i,j).  (4) 

=  hx(i,j)  +  v(t,j),  (5) 

for  1  <  ij  <  N.  z(i,j)  is  the  scalar  observed  image  data,  t >(i,j)  and  tv(i,j )  are  scalar 
zero  mean  white  noise  field  as  defined  in  (1)  and  (2).  It  is  assumed  that  the  system  noise 
to(i.j)  and  additive  observed  noise  v(i,j)  are  uncorrelated. 

The  (MN  +  M)  x  (MN  -f  M)  transition  matrix  F  consists  of  the  coefficients  of  the 
NSHP  model  in  a  companion  matrix  form. 


//(M) 

/( 1»2) 

/( 1,3)  . 

.  f(l,MN  +  M  -  1) 

f(l,MN  + 

M)\ 

l 

0 

0 

0 

0 

0 

1 

0 

•  • 

0 

0 

• 

V  o 

0 

• 

0 

1 

• 

0 

/ 
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where  the  elements  of  the  first  row  of  state  transition  matrix  /"(*',  j)*s  are  the  coefficients 
of  the  original  NSHP  model  (1)  which  are  assigned  as  follows: 

/(1,/e)  =  a(0,n)  n  =  l,...,Af;  /c  =  l,...,M; 

f(l,k)  —  a(l,n)  n  =  -M, . . . , M;  k  =  N  —  M,...,N  +  Af ; 

/(l, k)  =  a(2,n)  n  =  -Af, . . . , M;  k  =  2N  -  M>...,2N  + 

/(1,/e)  =  a(Af-l,n)  n  =  -Af . Af;  k  =  (M  -  1)N  -  M, . . . ,  (Af  —  l)iV  +  M; 

/(1,/e)  =  a(M,n)  n  =  k  =  MN  -  +  Af; 

/(l,fc)=0;  k  =  others. 

The  (MTV  -f  Af)  x  1  column  vector  d  is  as  follows: 


dT  =  (l  0  ...  0).  (7) 

The  1  x  (MN  +  M)  row  vector  h  has  the  form  as  follows: 

fc  =  (l  0  0  ...  0).  (8) 

The  filtering  problem  can  be  stated  as  follows.  From  the  NSHP  image  model  (1)  and 
noisy  image  data  (2),  the  finite  dimensional,  discrete  time,  linear  system  (4)  and  (5)  are 
defined  for  l  <  i,j  <  N  and  the  estimates 

l 

xa  =  E{x(i,j)  |  z(i, j),z(i,j  -  1),..., 2(1,1)}  (9) 
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and 


xb  =  E{x(i,j)  |  z(i,j  -  l),x(t,i  -2),...,s(l,l)}  (10) 

should  be  determined  by  CF  algorithm  where  subscript  6  represents  "before  update"  which 
means  the  one  step  prediction  and  subscript  a  represents  "after  update"  which  means  the 
filtering  estimate. 


iii.  CF  Algorithm 

The  Chandrasekhar  algorithm  for  the  solution  to  the  minimum  variance  filtering  prob¬ 
lem  for  the  linear  discrete  time  system  has  been  derived  by  Morf,  Sidhu  and  Kailath  [4]. 
This  CF  algorithm  can  be  directly  applied  to  the  state  space  model  (4)  and  (5)  to  yield 
following  vector  format  equations  for  the  scalar  CF  algorithm. 

Prediction  Equations: 

Xb(i,j  +  1)  =  Fxa(i,j).  (11) 

Filtering  Equations: 

*«(*.»  =  (12) 

where  z  is  the  observed  image  pixel  value  along  the  raster  scanned  noisy  image  data. 
Equations  to  Update  the  Kalman  Gain  Matrix: 

R{i,j  +  1)  =  R{i,j)  +  hY[iJ)S{iJ)YT(iJ)hT%  (13) 

+ 1)  =  +  y(i,i)5(«,i)rT(«,y)fcTj/z“1(iI  j  + 1),  (14) 

Y(iJ  +  1)  =  FlY(iJ)  -  K(iJ  +  l)hK (*',»),  (15) 
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S(i,y  +  1)  =  S(i,y)  +  S(i,j)YT(i,j)hTX-‘(i,j)hr(i,i)S(i,j). 

(16) 

Taking  advantage  of  the  companion  matrix  format  of  F  matrix,  some  manipulation 

involving  matrix  algebra  yields  the  following  set  of  scalar  CF  equations. 

Prediction  Equations: 

M 

N+M 

Xbl(i,j  +  1)  =  £  f(i,k)Xak{i,j)  + 

£  f(i,k)X*k{iJ)+ 

fc=l  k 

NM+M 

=M+N-2M 

• 

(») 

...+  £ 

f(i,k)X*k(i,j) 

fc=NM+M-2M 

Xbk{iJ  +  1)  =  *a(fc-l )(«,/)» 

k  =  2,3,...,7VM  +  M. 

(18) 

Filtering  Equations: 


Xak{i,j)  =  Xbk(i,j)  +  Kk[i,j)[z[i,j)  -  £»i(t,i)].  (19) 


where  k  =  1,2,...,  NM  +  M. 

Equations  for  updating  the  (MTV  +  M)  x  1  column  vector  K  and  Y  and  scalar  R  and 
S  are  easily  obtained  as  follows: 

R(iJ  +  1)  =  «(.’,»  +  (20) 

+ 1)  =  +  n(.,y)s(.,y)V',(.\y)|/r'(<,y  + 1),  (21) 

where  fc  =  1, 2, . . . ,  NM  +  M. 

M 

nVJ  + 1)  =  £  /o,*)[n(.\y)  -  Jf*(<,y)r,(«,y)]+ 

*=1 

N+M 

E  (22) 

fc=M +N-2M 

NM+M 

•  ■  •  +  X]  /(i.Jb)(y*(«»j)  -  Kk[iJ)Yi{itj)] 

k=M  N +M -2M 
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Yk{i,j  +  1)  =  Y{k^i){iJ)  +  (23) 

where  k  =  2, 3, ... ,  NM  +  M. 

S{iJ  +  1)  =  S(iJ)  +  S2{iJ)Y?{iJ)R~l{iJ).  (24) 

The  equations  can  be  processed  recursively  starting  with  the  initial  conditions. 

fl(I,l)  =  «r’.  (25) 

K(l,  1)  =  0.  (26) 

K(l,l)  =  d.  (27) 

S(l,l)=<r2.  (28) 


In  the  evaluation  of  the  computational  requirement  of  the  CF  equations  (17)  (19)  (20) 
(25)  (22)  (23)  and  (24),  the  quantity  S(i,j)Yi(i,j)  and  S(i,j)Y2(i,j)  are  only  calculated 
once  and  sto»  ed.  The  resulting  number  of  multiplication  and  addition  for  each  iteration  will 
be  in  terms  of  the  order  of  NSHP  model  M  as  well  as  image  size  index  N.  Since  this  scalar 
algorithm  restores  one  pixel  at  each  iteration,  the  number  is  equal  to  the  computation 
requirement  of  each  pixel  restoration. 

The  numbers  of  multiplication  Nmnr  and  addition  Naar  per  pixel  restoration  can  be 
calculated  by  using  equation  (29)  and  (30)  respectively. 

Nm„r  =  5  NM  +  6  M2  +  llAf  +  8.  (29) 

Nanr  =  3 NM  -I-  6Af2  +  9M  +  2.  (30) 

1234 
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iv.  The  Comparison  of  Computational  requirements 


The  computational  requirement  of  the  Bcalar  CF  algorithm  will  be  compared  with 
the  2D  Riccati-type  KF  algorithm  as  well  as  the  suboptimal  RUKF  algorithm  which  is 
developed  by  Woods  and  Radewan  (2]-[3).  First  the  number  of  the  multiplication  and 
addition  of  each  iteration  of  the  2D  KF  algorithm  will  be  evaluated.  In  order  to  compare  the 
computational  requirement  properly,  the  2D  scalar  KF  algorithm  which  utilizes  companion 
form  of  F  matrix  and  symmetric  property  of  covariance  matrix  P  should  be  taken  into 
consideration.  With  the  same  state  equations  as  well  as  the  prediction  equations  (17),  (18) 
and  filtering  equation  (19),  only  the  scalar  format  updating  equations  of  the  Kalman  gain 
matrix  K  and  error  covariance  matrices  Pa  and  Pb  should  be  derived  and  analysed.  A 
careful  investigation  and  calculation  yield  following  results. 

The  number  of  multiplications  involved  into  the  KF  equations  is 

NmKr  =  N2(M2)  +  N(AM3  +  6Af2  +  2  M)  +  {AM3  +  7  M2  +  AM).  (31) 

The  number  of  additions  involved  into  the  KF  equations  is 

NaKP  =  N2(M2)  +  N(4M3  +  %M2  -  2 M)  +  4M3  +  1M2  +  3.  (32) 

Then  the  computational  effort  of  the  RUKF  is  evaluated.  In  this  work,  the  global  state 
vector  (1)  will  be  partitioned  into  two  parts.  One  is  the  local  supporting  part  of  elements 
which  will  join  the  updating  recursion  computation  and  the  other  part  consists  of  the  rest 
of  elements  which  will  not  be  updated  at  each  iteration.  Consequently,  the  Kalman  gain 
matrix  K  and  the  error  covariance  matrices  Pa  and  Pb  are  partitioned  accordingly.  Only 
part  of  the  matrix  K  and  matrix  Pa  corresponding  to  the  local  supporting  part  of  the  state 
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space  vector  are  updated.  This  approximation  approach  reduces  the  computational  burden 
significantly,  but  the  RUKF  algorithm  only  gives  the  suboptimal  filtering  result  and  also 
has  poor  numerical  stability.  Simply  following  the  scalar  format  of  the  RUKF  equations, 
the  number  of  multiplications  and  additions  are  carefully  evaluated.  Taking  into  account 
of  the  companion  matrix  format  and  symmetric  property,  the  evaluation  results  can  be 
expressed  as  follows. 

The  number  of  multiplications  involved  into  the  RUKF  equations  is 


Nm„VKr  =  N(6M3  +  2  M2D  +  6M2  +  DM  +  M) 

+  (6  M3  +  2  M2D  +  12M2  +  5  MD  +  7Af  +  2D  +  3) 


(33) 


The  number  of  additions  involved  into  the  RUKF  equations  is 

NaiflJKF  =  iV(6M 3  +  2M2D  +  6  Af 2  -I -DM-  2  M) 

+  (6  M3  +  2  M2D  +  10M2  +  3  MD  +  2M  +  D  +  4) 


(34) 


In  order  to  make  the  RUKF  algorithm  more  numerical  stable  and  the  filtering  result  more 
accurate,  the  local  supporting  area  is  often  enlarged  by  adding  2D  x  M  pixels  into  the 
NSHP  supporting  pixel  area.  This  is  why  the  variable  D  appears  in  the  equation  (33)  and 
(34). 

Comparing  the  numbers  of  major  computer  operations  of  NmKr  (31)  and  NaKr  (32) 
of  the  scalar  KF  algorithm  and  Nmn„Kr  (33)  and  Natt„Kr  (34)  of  the  RUKF  algorithm 
with  Nm„r  (29)  and  Na,;r  (30)  of  the  scalar  CF  algorithm,  we  can  see  that  the  numbers 
of  the  major  computer  operations  per  pixel  restoration  of  the  scalar  KF  algorithm  will 
be  of  0(N2M2),  that  of  the  RUKF  will  be  of  0(NM3)  and  that  of  the  scalar  CF  will 
be  only  of  O(NM).  The  improvement  of  computational  expense  is  obvious.  Since  the 
computational  burden  of  the  KF  and  RUKF  algorithm  will  increase  much  faster  than  the 


1236 


scalar  CF  algorithm  does  as  the  M  or  N  increasing,  a  simple  numerical  example,  N=64, 
M=1  and  D=2,  can  give  us  an  idea  how  much  computational  burden  has  been  saved  when 
the  scalar  CF  algorithm  is  used.  In  this  example,  the  KF  algorithm  needs  about  4879 
multiplications  and  4622  additions  per  pixel  restoration;  the  RUKF  algorithm  needs  about 
1257  multiplications  and  1058  additions  per  pixel  restoration;  and  the  scalar  CF  algorithm 
only  needs  345  multiplications  and  209  additions  per  pixel  restoration.  It  concludes  that 
the  scalar  CF  needs  only  less  than  7 %  of  the  multiplications  and  less  than  4.5%  of  the 
additions  that  the  KF  algorithm  does  and  less  than  27%  of  the  multiplications  and  less 
than  20%  of  the  additions  that  the  suboptimal  RUKF  algorithm  does. 

v.  Simulation  Results 

A  random  field  is  generated  by  a  1  x  1  order  NS  HP  model  which  represents  the 
noise  free  image  shown  on  Fig.  1.  White  noise  is  added  into  this  generated  image  to 
produce  the  noise  contaminated  image  with  SNR  =  3 jb  which  is  shown  on  Fig.  2.  An 
estimated  image  is  then  computed  using  the  developed  scalar  CF  algorithm  which  yields 
SNR  =  12.3<jfi-  Fig.  3  displays  the  estimated  image.  The  experiment  also  shows  that  the 
algorithm  converges  fast  and  possesses  good  numerical  properties. 

vi.  Conclusion 

The  2D  optimal  scalar  CF  algorithm  has  been  derived  and  implemented  in  this  paper. 
The  computational  requirements  of  this  new  algorithm  is  reduced  significantly  comparing 
with  the  2D  KF  algorithm  and  the  suboptimal  RUKF  algorithm.  The  effectiveness  of  this 
algorithm  is  verified  by  processing  a  simulated  image.  The  experiment  also  shows  that  the 
numerical  properties  of  the  CF  algorithm  is  better  than  the  conventional  KF  algorithm. 
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Figures: 


Fig.  1.  The  noise  free  image  g[i,j) 
generated  by  an  1  x  1  NSHP  model 


Fig.  2.  The  white  noise  corrupted 
image  z(i,j)  with  SNR  =  3  dB 


Fig.  3.  The  filtered  version  of  the  image 
in  Fig.  2  obtained  with  scalar  CF. 


OBJECT  TRACKING  USING  SENSOR  FUSION 
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ABSTRACT.  Motion  is  an  important  cue  for  extracting  moving  targets.  This  is 
especially  so  when  the  targets  are  camouflaged  in  the  background  such  that  segmentation  of 
the  scene  does  not  reveal  any  information  about  the  targets.  There  are  three  different 
approaches  for  using  passive  sensors  for  obtaining  optical  flow  fields,  which  are  the 
projected  velocity  vectors  of  the  points  on  the  moving  objects  on  a  plane  peipendicular  to 
the  line  of  sight.  These  approaches  are  matching  methods,  spatio-temporal  gradient  based 
techniques  and  biologically  based  methods.  Problems  common  to  these  approaches  are  the 
sparseness  of  the  optical  flow  fields  and  the  existence  of  sensor  and  algorithm  generated 
false  vectors. 

To  alleviate  these  problems  we  are  using  a  novel  multi-sensor  technique.  In  this  paper  we 
have  investigated  a  gradient  based  approach  to  obtain  optical  flow  fields  from  sequences  of 
multi-sensor  images.  Due  to  the  particular  nature  of  each  sensor,  the  obtained  optical  flow 
fields  originated  from  each  sensor,  produce  overlapping  and  complementary  vectors  and 
this  point  is  exploited  in  our  approach.  A  joint  multidimensional  histogram  of  the  sensors' 
optical  flow  fields  in  terms  of  their  salient  features  such  as  magnitude  and  direction  are 
created.  The  highest  peaks  in  this  multidimensional  space  correspond  to  the  different 
moving  targets.  This  information  can  then  be  used  to  segment  the  scene  and  to  separate  the 
moving  targets  from  the  background. 

This  technique  is  potentially  powerful,  simple  to  implemer  t  and  is  relatively  insensitive  to 
background  noise. 

I.  INTRODUCTION.  Motion  is  an  important  cue  for  extracting  information 
from  moving  objects.  In  numerous  situations,  an  otherwise  non-detectable  target, 
camouflaged  in  the  background,  can  be  detected  and  recognized  due  only  to  its  motion. 
There  are  three  basically  different  approaches  in  the  literature  for  image  based  motion 
detection  namely  matching  techniques,  spatio-temporal  gradient  methods  and  biologically 
based  techniques  [1-11].  In  all  of  these  approaches  sequences  of  images  containing  the 
moving  targets  are  used  to  obtain  optional  flow  fields.  Optical  flow  fields  are  the  projected 
velocity  vectors  associated  to  each  point  on  the  scene  on  a  plane  perpendicular  to  the  line  of 
sight  One  of  the  main  problems  common  with  all  these  approaches  is  that  the  optical  flow 
fields  are  sparse  due  to  the  fact  that  textural  variations  on  the  target  as  viewed  by  a  sensor 
are  usually  small.  The  other  problems  are  the  presence  in  the  generated  optical  flow  fields, 
of  the  background  and  algorithm  induced  false  vectors  that  can  adversely  affect  the  entire 
detection/recognition  system. 

In  this  paper,  we  present  a  multi-sensor  approach  for  object  tracking.  The  sensors  are 
assumed  to  be  imaging  and  relatively  collocated.  There  is  a  trend  toward  multi- sensor 
.i  -preach  in  many  industrial  and  military  applications  [11].  This  trend  is  justified  and 
encouraged  by  the  need  for  more  reliable  information  and  the  potential  robustness  in 
performance  that  is  usually  associated  with  multi-sensor  systems.  The  existence  of  multi- 
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sensors  brings  with  it  the  problem  of  sensor  fusion:  how  to  synergistically  combine  the 
information  that  is  available  through  individual  sensors. 

Our  approach  addresses  these  problems  by  presenting  a  novel  object  tracking  technique  that 
synergistically  fuses  the  motion  information  available  from  the  single  sensors. 

II.  OBTAINING  OPTICAL  FLOW  FIELDS.  Optical  flow  can  be  obtained  by 
matching,  spatio-temporal  and  biologically  based  methods.  In  the  matching  methods 
attempt  is  made  to  establish  correspondence  between  the  successive  image  frames  of  the 
same  scene.  This  correspondence  is  achieved  by  scene  matching  technique.  Usually  die 
images  are  needed  to  be  segmented  before  they  can  be  used  by  these  methods. 

Spatio-temporal  methods  on  the  other  hand  work  on  the  raw  images,  do  not  need  the 
solution  to  the  correspondence  problem  and  are  good  for  determining  the  optical  flow  of 
multi-target  scenes. 

The  basis  for  the  spatio-temporal  gradient  technique  is  the  so  called  gradient  constraint 
equation  which  relates  the  changes  in  the  brightness  of  images  in  successive  frames  to  the 
temporal  changes  in  the  scene.  For  an  object  of  constant  brightness  u[x,  y,  t],  the 
following  equation  is  derived: 


C1=V*Vu  + 

(1) 

where 

1 X 

3t 

•u 

> 

X 

> 

III 

4-1 

(2) 

3u-i 

3yJ 

III 

'c' 

X 

c 

(3) 

There  are  several  approaches  to  determining  optical  flow  V  using  Equation  1.  These 
techniques  differ  in  their  stating  of  the  second  constraint  equation  and  the  expression  that  is 
to  be  minimized  [3,4,5].  Figure  1  shows  two  sequential  frames  from  a  moving  scale 
model  car,  obtained  by  video  camera  at  the  rate  of  30  frames  per  second.  The  resultant 
optical  flow  fields  shown  in  Figure  2  are  obtained  by  using  equation  (1)  and  a  second 
constraint  relations 


In  the  following  minimization  relation: 


Minimize  error  =  E  =  /  /  (a2C2+C1  )dxdy  (5) 


where  samples  are  taken  at  discrete  points  in  space  and  time  and  quantized  in  brightness. 
The  partial  derivates  f  £  and  }  are  estimated  by  averages  using  eight  measurements  in 
two  image  frames.  As  can  be  seerioptical  flow  conveys  information  about  the  outer 
boundaries  of  the  car.  This  information  can  be  used  as  an  aid  in  scene  segmentation. 
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(b) 


Figure  1.  The  two  frames  of  a  sequence  of  a 
moving  car.  (a)  First  frame; 

(b)  Second  frame. 
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Figure  2.  Optical  Flow  Field  of  Car  Using  the 
First  Two  Frames  of  Pictures. 


III.  SEGMENTING  AND  TRACKING.  Figure  3  shows  the  steps  involved  in 
using  the  optical  flow  from  different  sensors  to  segment  and  track  moving  objects.  The 
case  for  two  seniors  is  shown,  however,  its  generalization  to  more  imaging  sensors  is 
straightforward. 

The  optical  flow  obtained  from  each  sensor  is  used  to  form  a  joint  histogram  in  terms  of  the 
salient  attributes  of  the  optical  flow  fields  from  each  sensor.  These  attributes  in  simplest 
form  are  angular  direction  and  the  length  of  the  vectors  that  are  present  in  the  optical  flow 
fields.  The  histogram  is  formed  by  placing,  in  a  3D  space  coordinate  system,  on  the  x  axis 
the  length  of  each  vector,  on  the  y  axis  the  directional  angle  of  the  corresponding  vector, 
and  on  the  z  axis  the  number  of  vectors  that  have  the  same  length  and  directional  angle. 
Figure  4  displays  this  point.  As  can  be  seen  different  targets  having  different  velocity  can 
be  grouped  separately,  moreover  in  a  joint  histogram  space  true  moving  target  fields 
reinforce  each  other,  leading  to  a  higher  peak,  however  the  false  vector,  that  are  present  in 
one  sensor's  optical  flow  field  but  are  not  present  in  the  other,  will  have  a  diminishing 
effect.  From  die  peaks  in  the  histogram  domain  the  corresponding  optical  flow  vectors  in 
the  single  sensor  fields  are  identified.  Due  to  noise  not  all  of  the  optical  flow  vectors 
corresponding  to  a  target  have  exactly  the  same  direction  and  length,  for  this  reason  the 
peaks  in  the  histogram  space  are  not  going  to  be  sharp  vertical  lines.  To  tolerate  these 
variations,  one  needs  to  choose  a  window  centered  at  the  peaks  and  consider  all  of  the 
vectors  falling  inside  the  window  as  belonging  to  the  same  target.  The  size  of  the  window 
can  be  chosen  experimentally.  Once  the  optical  flow  vectors  in  the  single  sensor  optical 
flow  fields,  corresponding  to  different  targets  are  identified,  the  points  in  the  image  domain 
corresponding  to  the  joint  sensor  optical  flows  can  be  determined,  leading  to  the 
segmentation  and  tracking  of  the  moving  objects.  Notice  that  each  sensor  may  produce 
partial  flow  fields  relating  to  a  moving  target,  based  on  the  nature  of  the  sensor  used;  for 
example,  infrared  sensor  may  produce  flow  vectors  at  the  junctions  of  temperature 
variations,  but  these  vectors  may  be  missing  in  visible  flow  fields  and  vice  versa. 

However,  in  our  proposed  approach,  these  partial  flow  fields  are  combined  to  produce  a 
potentially  better  segmentation  result  leading  to  better  tracking  performance. 

The  preliminary  results  of  the  implementation  of  the  approach  has  been  encouraging. 

These  results  are  being  analyzed  and  will  be  published  in  the  future. 

IV.  SUMMARY.  A  novel  multi-sensor  technique  for  segmenting  and  tracking  of 
objects  were  presented.  Optical  flow  fields  available  from  the  single  sensors  are 
synergistically  combined  to  produce  a  potentially  better  segmentation  and  tracking 
performance.  This  technique  has  several  potential  benefits  mainly: 

o  Robustness  in  the  presence  of  registration  error,  field  of  view  and  resolution 
of  the  sensors  need  not  be  exactly  the  same. 

o  Robustness  in  the  presence  of  scene  and  algorithm  dependent  noise 

o  Better  segmentation  due  to  the  aggregation  of  object  related  features  from 
optical  flow  fields 

o  Other  usual  multi-sensor  benefits  such  as 

various  weather  and  day  or  night  applicability 
better  countermeasure  immunity 

The  cost  of  the  approach  are  mainly 

physical  complexity  of  using  multi-sensor 
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computational  cost 

Future  experimentation  will  show  whether  the  potential  benefits  of  this  approach  will 
justify  the  cost  that  it  entails. 
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RANDOM  FIELD  IDENTIFICATION  FROM  A  SAMPLE 
Mlllu  Aoaenblatt-Roth 

Canter  for  Autonation  Research,  Uni varsity  of  Maryland 
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1.  Introduction. 

In  what  follow*  we  consider  the  following 
problem  Given  a  sample,  determine  the  random 
field  that  generated  It.  In  order  to  Mke  the 
problem  reasonable,  it  is  necessary  to  assume  that 
the  field  is  not  arbitrary  but  belongs  to  son* 
specific  class.  Making  such  an  assumption  meant 
in  reality  that  we  are  not  considering  the  problem 
of  finding  the  field  that  generated  the  given 
sample,  but  some  other  field  that  belongs  to  the 
given  class  and  approximates  the  field  that 
generated  the  given  sample.  In  this  paper  we  will 
consider  that  the  approximating  field  to  be  found 
is  stationary,  compoaed  of  independent  random 
variables,  so  that  it  may  be  considered  one* 
dimensional. 

In  this  paper  ws  will  not  be  interested  in  the 
problem  of  evaluating  how  good  this  approxlsmtion 
is,  because  this  aspect  is  treated  in  the  author's 
papers  IS),  It). 

In  order  to  be  able  to  deal  with  digitised 
data  and  at  the  earns  time  to  reduce  the  complexity 
of  the  problem,  w*  will  consider  only  the  case 
that  each  of  the  random  variablea  takes  only  a 
finite  act  of  values.  In  order  to  extend  these 
results  to  the  continuous  case,  we  would  have  to 
consider  some  process  of  approxlsmtion,  such  as 
that  used  by  the  author  (21— (41 • 


where  each  k  II  <  r  «  s)  can  take  any  value  1 
(1  <_  1  <n) .  'because  of  atatlonarlty,  the  probabil¬ 
ity-^  f  occurrence  of  the  sequence  C,  does  not 
depend  on  the  moment  when  the  trials  begin*  taking 
Into  consideration  the  independence  of  the  trials, 
this  probability  can  be  written  as 

B 

PIC  )  -  TT  HA.  |  (2.2) 

r-1  *r 

Let  us  denote  by  m^  II  <  1  <  n)  the  number  of  times 
the  outcome  A|  appears  In  the  sequence  ca ,  so  that 
n 

l  m.  -  s  .  (2.3) 

1-1 

The  equality  (2.2)  can  be  written 

"  ■, 

HC  >  -  IT  P.  (2.4) 

*  1-1  1 

In  what  follows  we  denote  by 

n  . 

H  -  l  P.  log  (2.5) 

1-1  1  P1 

the  entropy  of  the  random  field  characterised  by 
the  probabilities  p^  (1  £  1  <n),  and 
n~  ”  , 

0  •  l  log  —  (2.4) 

1-1  Pi 


Obviously 


0  <  o  <  ■ 


(2.7) 


2.2  The  theorem 


Numerical  examples  are  given,  showing  that 
good  approximations  can  be  obtained  based  on 
relatively  small  sample  sixes.  In  particular, 
this  approach  can  be  used  to  find  random  field 
model*  that  generate  given  samples  of  image 
texture,  and  so  can  be  applied  to  texture  classi¬ 
fication  or  segamntation.  Similar  reauits  were 
obtained  already  by  the  author  for  simple  station¬ 
ary  Markov  chains  17]  as  well  as  for  unilateral 
Markov  two-dimensional  fields,  and  will  be  present¬ 
ed  with  other  occasions. 

The  author  thanks  Prof.  Asrlel  Rosenfeld  for 
suggesting  the  problem,  for  the  many  substantial 
discussions  of  this  subject  as  well  e*  for  his 
great  interest  in  and  sponsoring  of  this  research. 


2.  The  direct  theorem. 

3.1.  Generalities 

Let  us  consider  a  sequence  of  Independent 
trials  with  possible  outcomes  Aj  (1  <_  i  «  n)  and 
corresponding  probabilities  pj  >0  (1  «  I  <  n) 
adding  up  to  1.  Each  possible  result  of  a  series 
of  s  consecutive  trials  can  be  written  as  a 
sequence 

C.  -  (A.  ,\  ,...,  Ak  )  (2.1) 

_ ■ _ "|  "2 _ _a _ 


The  support  of  the  U.  S.  Air  Force  Office  of 
Scientific  Research  under  Oontract  F49420-B5-R- 
0009  is  gratefully  acknowledged. 


Let  us  denote  by  T,  the  class  of  all  sequences 
C,  .  Tor  given  <  >  0,  s  »  0  we  denote  by  Ti  the 
set  of  all  sequences  C,  *  T,  such  that  * 

|mt  -  *Ptl  *  **  (2.S) 

for  all  1  (l  <  1  <  *),  and  by  TJ  its  complement 
with  respect  to  r#.  ' 

Definition.  Sequences  C#  6  T!  will  be  called 
(4, a) -standard  sequences  or  sutple  standard 
sequences. 

Lot  us  consider  the  equation 

“  f  exp{-  4-)  dx  ■  y  (1-i)  (2.9) 

/u  '0  2  2  " 

and  let  us  denote  by  u(c)  its  solution. 

Definition.  Civen  c  >0,  (>  0,  s  >  n,  condition  A 
holds  if  . 

44  c  s  >  n  (2.10) 

and  condition  B  holds  if 

4#*s  »  uJ(c)  (2.11) 

Let  ue  denote  b.,>  Ml*)  the  cerdlnality  of  a  set. 

Theorem  1.  Let  us  suppose  that  at  least  on*  of  the 
conditions  A,  B  holds.  Then 

(a)  If  C  is  a  «,a)-standard  sequence,  it  follows 

that  .  . 

I  I  lo*  PIC^T  *  "|  «  4«* 

(b)  pm  1  >  1  -  c  12.13) 
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i.e.  to 


P (C§>  -  2 


s 

■sH+sJpB 


<c>  lim  j-  log  Ntr^s)  -  H  (2.14) 

4-K) 

Remark  1.  The  relation  (2.12)  is  equivalent  to 

2-s(H+«P)  k  p(c j  <  j-s(H-«p)  (2.15) 

| e |  <  l  (2.16) 

Remark  2.  The  relation  (2.13)  is  equivalent  to 

p<r;  )  <  c  (2.17) 

0  #  8 

Remark  3.  From  (2.14)  it  follows  that 
N(T!  .) 


11m 


a  N<r"  «> 

6|S  _  . ,  o»s 

1 —  -  0  ,  lim  - 


*_  N<r  ) 

g  s  S-*4"  6 

Indeed,  from  (2.14)  we  obtain  the  relation 


Nirs) 


(2.18) 


i.e. , 


log  N(r;  j  -  s  •  (h+  o(d) 

Of  S 


N«ri.s>  ■  2 


i(H4o(l)) 


Taking  into  consideration  that 


Nlf  )  -  nS  -  2S  l0q  " 
s 


and  because 


H  <  log  n  , 


it  follows  that 


(2.19) 

(2.20) 

(2.21) 

(2.22) 


2-(log  n-H4o(l)J  .  o(1)  (2#23) 

which  is  equivalent  to  the  first  equality  in 
(2.18),  and 


N«r:  j  N(r  )-N(ri  j 

6,S  S  0  f  S 

Nirs)  NirB) 


l  - 


N<ri,s> 

N(fs) 


1  ♦  od) 


(2.24) 


whicli  is  equivalent  to  the  second  equality  in 
(2.18). 


Remark  4.  Our  Theorem  1  is  closely  related  to 
some  results  which  go  back  to  Shannon  (9)  and 
received  a  mathematically  acceptable  form  from 
Khinchine  |1|. 

Our  Theorem  1  (a)  ,  (b)  refers  to  independent 
random  variables,  while  that  in  (3)  refers  to 
ergodic  simple  Markov  chains,  but  our  result  is 
not  a  particular  case  of  that  in  (3).  Indeed,  the 
results  in  |3)  are  existence  theorems,  considering 
that  {,  c  can  be  taken  as  small  and  s  as  large  as 
desired,  while  our  results  give  effective  rela¬ 
tions  between  6,  c,  s  in  order  that  the  results 
hold. 


From  (2.4)  there  follows  the  relation 

n 

log  P(C  )  -  l  m  log  p.  (2.26) 

i-1  1  1 

and  taking  into  consideration  (2.25),  there  follows 
the  equality 

n 

log  P(C  )  -  l  (sp.  *  s68, )  log  p. 

1-1  *  1  (2.27) 

r  n 

-  *  L  Pi  log  P,  ♦  B«  •  l  e  log  p. 
i-1  1  1  i-1  1  1 

which  can  also  be  written  as 

1  n 

109  pTFT  -  »H  ♦  I  #,  log  ~  (2.28) 

s  i-1  P1 

From  (2.28)  we  obtain  the  result  (a) t 


1 

P(C#) 


h|  <  «  •  l  kl  log 

1  i-l  1  pi 

n 

£  *  •  l  log  —  -  «p 
i-1  pi 


(2.29) 


(b)  Instead  of  proving  inequality  (2.13)  we 
will  prove  (2.17).  In  order  that  a  sequence  C  E  T 
belong  to  Tjj  s,  it  is  necessary  that  for  at  least  * 
some  value  o£  i  (1  <_  i  <_  n)  the  Inequality  (2.8) 
does  not  hold,  i.e.. 


r«,s  "  {l"i-,pil  ”  •«}  <2-3°> 

so  that 


>lr«,s)'  P{i^1{,mi‘Spil  >  s*}}iiIiP{lV8pil>  s4} 


(2.31) 


(bl)  Let  us  assume  that  condition  A  holds.  It 
is  known  from  the  elements  of  the  theory  of  proba¬ 
bility  that 


/ 1  .  1  P^l-p.) 

*>{IV*pil  »  8«)  I - 2  <2- 

But  for  0  £  x  <_  1,  we  have  the  inequalities 


32) 


O  £  *(1-  x)  <_  j  (2.33) 

where  the  maximum  value  is  reached  for  x  •  i,  so 
that  from  (2.32)  it  follows  that  2 

P{  l*ni~*Pl  I  *  ,6}  £  ~ d£i£n)  (2.34) 
4s  6 

Consequently,  from  (2.31)  there  follows  the 
inequality 


PIT 


and  because  of  (2.10),  it 


£ - T  (2.35) 

4s« 

follows  that  (2.17)  holds. 


Our  Theorem  1  (c)  refers  to  the  set  TI  of 
all  standard  sequences  C  ,  while  the  result  in 
(1),  Th.  3)  refers  to  another  set  of  sequences  C  i 
our  result  contains  a  limit  for  6  •*  0,  s  ■*  ", 
while  the  result  in  (111,  Th.  3)  contains  a  limit 
for  I  ■*  ». 

3.  Proof 

(a)  Let  us  consider  a  sequence  C  S  ri  .  From 
(2.R)  it  follows  that  8  *8 

mi  “  Epi  +  sS6i  '  l°jl  *  1  (l£  i£n)  (2.25) 


(b2)  Let  us  assume  that  condition  B  holds. 
From  the  central  limit  theorem  in  the  Moivre- 
Laplace  form,  it  is  known  that 


pfpL- p.I  <  t]  -  p|.|--ClrA. J  < 4  /__» - \ 

U*  *1-  /  ll/si.Vi-F,)'  “  /  pi(1*pi>  / 
_  (2.16) 


/ _ s 


/  I’jd-pj) 

!  I 


X 

2 


dx  (1  <  i  <  n) 
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bo  that 


p|lV*pil  >  6s)  '  1  -  ~  Pi<j  Pl>  e  2  *t  , 


(1  <  i  <  n) 


(2.37) 


In  order  to  obtain  the  relation  (2.17)  it  is  suffi¬ 
cient  to  take 


Vpi(l-Pi)  x2 


1  - 


^7 


2  **<-t 


(1<  i  <  n) 

(2.38) 


i.e.,  4  / 


—  I 


/pi<1,'Pi>  -  i- 


2 

(1  <  i  <  n) 


which  is  equivalent  to  the  inequality 


*  ✓  p^(T-^T  *  u(c) 

Because  of  (2.33),  we  have  the  inequality 


A  '  8  s 

1  Pjd-Pi)  - 


26/b  (1  <  i  <  n) 


(2.39) 


(2.40) 


(2.41) 


so  that  in  order  to  satisfy  (2.40)  it  is  suffici¬ 
ent  to  take  into  consideration  condibion  B 
(2.11),  i.e. 


26/e  >  u(c)  . 


(2.42) 


(c)  If  C  £  T!  ,  then  (2.15)  holds,  so  that 
S  a , 8 


Ntrj  )  2 

o.s 


-S(H+6p) 


l  P(Cg)  -  P(r^g)«l  (2.43) 


where  the  summation  is  for  all  C  €  Tl  .  From 
(2.43)  there  follows  the  relation  '* 


log  N<ri  )  <  H  ♦  6p 

8  0(5 


(2.44) 


In  a  similar  way.  from  (2.13),  (2.15)  there  follow 
the  relations 


l-  e  <  p<ri  )  -  l  p (c  )  <  Nir:  >  2 

0(8  **  S  0  ( S 


-s(H-ip) 


(2.45) 


where  thr  summation  is  also  for  all  Cs  £  TJ  g. 
From  (2.45)  we  obtain  the  relation  ' 


In  what  follows  we  assume  that  CE  is  generated 
by  a  sequence  of  independent  trials,  with  possible 
outcomes  A^  (1  <_  i  <_  n)  with  unknown  probabilities 
pj  (1  1  ^  n),  and  we  will  try  to  determine  some 

intervals  in  which  those  probabilities  can  take 
values.  Let  us  denote 

m°  -  m. (C°>  (1 <  1 <  n)  (3.1) 

1  IS  •  “ 

and  by  w{s)  the  confidence  of  statement  S. 

3.2.  The  theorem 

Because  we  have  proved  that 

p(r*  )  >  l- 1  ,  p<r:  >  <  t  (3.2) 

Of S  0(8 


it  follows  that  with  confidence  larger  than  1  -  c, 

c°  e  r*  ,  i.e., 

8  0(S 

w||m°-  sp^  <  is  ,  (liiin)|  >  1  -  e  (3.3) 


i.e.. 


f  mi  ml  ) 

—  -  «  <  P,  <  —  +  «.  diiln)}  >  1  -  t 

1  s  1  E  >  (3.4) 


Let  be  the  Banach  space  of  all  vectors 


q  -  (qj,...^) 


(3.5) 


with  q^  real  numbers  of  any  sign,  with  norm 


Ihl!  -  8up||qi|>  1  <  i  <_  n|  !3.6) 


Let  n  be  the  totality  of  probability  measures 

p  -  (Pi»....Pn)  (3.7) 

with  pa  >  0  (1  <  i  <  n),  and 

n 

l  P,  »  1  (3.B) 

i-1 

This  is  a  metric  space  with  distance 

Ilp-P'!l  -  supjlpj-pjli  liiinj  (3.9) 

where  p,  p*  £  Dn  ,  p-p'  £  Ln .  If  p,p'  £  JI  are 
two  different  solutions,  satisfying  the  inequali¬ 
ties  in  (3.4),  it  follows  that 

|pi-p'|  <  26  (l<i<n>  (3.10) 


H-  6p  <  log  KIT*  >  ♦  J  log  (2.46) 

From  (2.44),  (2.46)  it  follows  that 

H-  6p  -  —  *  log  —  <  —  log  NO"!  )  <  H  ♦  6p 

s  1-t  s  6.S  (2.«7) 

For  c  given,  arbitrary,  6  as  small  as  we  want,  and 
s  as  large  as  we  want,  because  of  (2.7)  it  follows 
that  (2.14)  holds. 

3.  The  inverse  theorem. 

3.1.  Generalities 

Let  6  >  0,  r.  >  0,  s  >  1,  and  let  Cp  be  an 
arbitrary  s(>ecific  sequence,  belonging  to  T  .  Let 
us  assume  that  one  of  the  conditions  A  or  B  holds. 


so  that  from  (3.9)  it  follows  that 

lip-  p*  II  <  26  (3.11) 

We  have  thus  proved 

Theorem  2.  Let  us  assume  that 

(1)  i  ,  6  ,  s  satisfy  one  of  conditions  A,  Bi 

(2)  the  arbitrary  sequence  C°  £  T  is  gener¬ 
ated  by  an  independent  identically  distributed 
sequence  of  trials,  with  unknown  probabilities  p^ 

(1  ii  <  n). 

Then 

(a)  The  relation  (3.4)  holds. 

(b)  If  p,p',arr  two  different  solutions,  their 

distance  in  n  is  less  than  26. 
n 

Remark  4.  Let  L*  be  the  Banach  space  of  all 
vectors  (3.5)  with  norm  the  total  variation 
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Ilhlll  -  l  kj  '3-12> 

i*l 

Then  is  a  metric  space  with  distance 

IIIp-P'III  *  I  |Pi-F!|  <3.13) 

i=l  1  1 

where  p,p'  £  Hn. 

If  p,p'  6  n  are  two  different  solutions,  satisfy¬ 
ing  (3.4),  it  follows  from  (3.13)  that- 


u  Cc )  >  1,8 


(4.7) 


Considering  Condition  B  in  form  (1.42)  it  is  easy 
to  see  that  it  holds.  From  (3.4)  it  follows  that 


wjo. 291  <  Px  <  0.309;  0.691  <  p2  <  0 

and  from  (3.11)  it  follows  that 


.709 i  >  0.875 
>  (4.8) 


||  p-  p*  II  *  0-018  (4.9) 


Il!p  -  P*  III  <  2n«  (3.14) 

It  is  easy  to  see  that 

Hf-P'II  1  HIP  "  P‘  III  £  nl  Ip-P’  I  I  <3.15) 

We  remark  also  that  if  L"  is  the  Euclidean  space  of 
all  vectors  (3.5)  with  norm 


(  (q>  ) 


t)  1/2 


a  o 

space  with 

f  n 

((p-p'l)  -  |  I  IFj^  -  P<l  j 


then  II  is  a  Euclidean  space  with  distance 

3)  1/2 


(3.16) 


(3.17) 


It  is  easy  to  see  that 

| |p-  P'  II  1  ( (p-  p' )  )  £  </n  ||p-  p*  ||  (3.18) 


4. 


Examples. 


4.1.  Examples  under  Condition  A 

0  4 

Example  1.  Let  C  be  a  sequence  with  n  *  2,  s  “10  , 

e  «=  2~i  *  0.125,  S  6  >  0.02,  so  that  condition  A 

holds.  Let  *  3  *  10^,  *  7  *  10  .  From  (3.4) 

it  follows  that 


wjo. 28 <  P:  <  0.32;  0. 68 <  p., <  0. 72 j  >  0.875 


(4.1) 

and  from  (3.11)  we  obtain 

||p-  r*  II  <  0.04  (4.2) 

Example  2.  Let  C°  be  a  sequence  with  n  -  2,s=10  , 
c  *  2“J  -  0.125,  S  5  >0.002,  so  that  condition  A 
holds.  Let  m9  =  3*  105,  nij  ■  7»  10. 

From  (3.4)  it  follows  that^ 

w| 0.298 <  p  <  0. ,02;  0.698  <  p  <  0.702}  >  0.875 

11  2  )  (4.3) 

and  from  (3.11)  we  obtain 

II  P  -  P*  II  <  0.004  (4.4) 


4.2.  Examples  under  Condition  B 


Example  3.  Let  C°  be  a  sequence  with  n*  2,e  *  2 
-  0.125,  s  -  104,S  6  >  0.009,  m”  -  3«  103,  m°  - 
7  *  10  ,  so  that 


1 

2 

|l  -  |j  -  0.4687:, 

(4.5) 

and  relation  (2.39) 

takes  the  form 

u  ( t 

n 

)  x_l 

I 

<  7  dx  >  0.46P75 

(4.6) 

tjv  J 

which  holds  for 

) 

Example  4.  Let  C°  be  a  sequence  with  n*  2,  c*  2 
-  0.125,  s  «  106,s  6  >  0.0009,  m°  -  3  *  105,  m°  « 
7*  10  j  in  this  case,  relations  T4 . 5)  —  (4 . 8)  hold, 
so  that  Condition  B  holds.  From  (3.4)  it  follows 
that 


W|0.  2991  <  Px  <  0.3009!  0.6991  <  p?  <  0.7009J  > 
and  from  (3.11)  it  follows  that 


0.875 

(4.10) 


||  p-  p*  ||  <  0.0018 


(4.11) 


4.3.  Examples  involving  images  that  satisfy 
Condition  A  or  B 


Example  5.  Let  us  consider  a  digital  television 
picture,  i.e.,  an  array  of  5002  points,  where  each 
point  can  have  256  levels  of  gray.  Here  n  *•  256, 
s  >=  5002  -  250,000,-  let  e  -  1/256  -  0.00390625. 
Taking  these  values,  if  we  want  condition  A  satis¬ 
fied  it  is  sufficient  that 


4S2  x  250,000  x  -L.  >  256 
256 

(4.12) 

or 

106  62  >  2562  , 

(4.13) 

i.e.. 

6  >  0.256  . 

(4.14) 

Consequently 

0 

nmi 

1  ) 

W-- 

p.  |  <  0.256;  (1  <^i  <  256)  >  > 

0.9960937 

U  s 

(4.15) 

with 

/  \ 

II  i  -  p* 

II  “  tiaxj  |p.  -  pj  | ,  1  i  256 j 

<  0.512 

(4.16) 

Example  6. 

With  the  same  basic  data  as 

in 

Example  5, 

we  take  n  «  256,  s  -  5002,  c 

-  1/256  » 

0.003°0625f  and  we  consider  that  condition  B  holds, 
i.e. , 

26/s  >  u (c)  (4.17) 

Here 


-  J  x  0.83334  «  0.41667  (4.18) 

so  that  from  tables  it  follows  that 


Thus 

i.e.. 


u(c)  “v  1.30 

(4.19) 

26  »  500  >  1.30 

(4.20) 

(  >  0.0013 

(4.21) 

<  0.0013;  di  i  ^  256)  j 

>  0.99o0937 

(4.22) 
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and 


II  P~  P'  I!  «  0.0026  (4.23) 


Example  7 
•  0.0625, 
Then 

i .  e . , 

or 

so  that 


Let  us  take  n  -  256,  s«  5002,  c  •  1/16 
and  let  us  assisne  that  Condition  A  holds. 

462  *  250,000  *  Jg  >  256  (4.24) 

106  «2  >  212  (4.25) 

6  >  0.004  (4.20) 


<  0.064|  1  ^  1  256 J  >  0.9375  (4.27) 

||p-p'|l  <  0.128  (4.28) 


Example  8.  Let  n«  256,  s«  250,000,  c  -  1/16 
-  0.0625  and  let  us  assume  that  Condition  B  holds. 
Then 


1  _  £1  a  i  h  -  — .  — i— 1  m  i  fl  -  — — •) 

2  1  nJ  2  1  16  256'  2  1  4096J 

4  1_4M0  “JU-  0.00025)  -f*  0.9975- 


(4.29) 

0.49987 


so  that 
i.e., 
or 
Thus 


u(c)  n,  3.8 
25  »  500  >  3.8 
5  >  0.0038 


(4.30) 

(4.31) 

(4.32) 


m° 

w{  — --p  <  0.0036;  l<i<  256>  >  0.09375 

11  “  1  ~  ~  1  (4. 33) 

Ip-pMI  <  0.0076  (4.34) 

Example  9.  Let  us  assune  that  we  have  a  30-minute 
sequence  of  TV  pictures.  If  we  have  32  pictures 
in  each  second,  we  have  a  total  of 

32*  60*  30  -  24  «  602  (4.35) 


pictures,  succeeding  each  other  in  time.  Assuming 
independence  between  the  pictures,  we  have  n-  256, 
a-  5002«  24x  602,  and  let  c  -  1/256  -  0.00390625. 
Assuming  that  Condition  A  holds,  the  value  of  is 
given  by 


442 

(250,000)  X  2*  «  602  x  > 

256 

(4.36) 

or 

£.  5  a  o  2 

10  6  X  2'  X  60  >  256 

(4.37) 

3  2 

10  <  X  2  X  60  >  256 

(4.38) 

Then 

6  >  — 5^ -  >  0.001 

(4.39) 

10  *  240 

Consequently 

j  <  0.001;  (l^i^256)j  > 

0.9960937 

(4.40) 

and 

II  P-P’ll  <  0-002 

(4.41) 

Example  10.  Let  us  consider  the  same  problem  as 
in  Example  9,  with  the  sup|«sition  that  Condition  B 
holds.  In  this  case 

26(500  M  22  »  60)  >  1.30  (4.42) 

i.e.. 


5  > 


13 


so  that 


2,400,000 


v  0.0000054 


(4.43) 


<  0.0000054;  (1 <  i <  256) >  >  0.9960937 
>  (4.44) 

IIp-P'II  <  0.0000108  (4.45) 
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APPROXIMATION  OF  7VO-DI MENSIONAL  RANDOM  FIELDS 


Millu  Rosenblatt-Roth 
Center  for  Automation  Research 
University  of  Maryland 
College  Park,  Maryland  20742 


1.  Introduction. 

(a)  In  soiae  problems  of  pattern  recognition 
and  Image  modeling,  as  well  as  in  some  aspects  of 
statistical  physics  e.g.  the  Ising  theory  of 
ferromagnetism,  the  phenoaiena  are  described  with 
the  help  of  random  fields.  Because  of  the  large 
number  of  random  variables  involved,  as  well  as  of 
the  complexity  of  their  interdependence,  a  itch 
random  field  representations  cannot  be  handled  in 
a  straightforward  manner. 

Because  none  of  the  random  variables  can  be 
deleted,  the  only  way  to  simplify  the  study  is  to 
restrict  their  interdependence,  i.e.  to  consider 
that  the  probabilities  referring  to  each  random 
variable  depend  only  on  the  values  of  the  random 
variables  situated  in  some  neighborhood  of  it, 
i.e.,  to  consider  random  fields  of  a  Markovian 
character. 

Obviously,  each  problem  may  ask  for  soma 
specific  kind  of  neighborhood,  so  that  it  is 
necessary  to  use  various  clasaaa  of  such  particu¬ 
lar  random  fields. 

Sometimes  even  such  simplifications  are  not 
sufficient  and  it  is  necessary  to  use  various 
subclasses,  containing  Markovian  random  fields 
especially  fit  to  describe  unilateral  developing 
phenomena. 

(b)  At  this  moment  there  does  not  exist  a 
coherent  theory  of  such  random  fields,  but  only 
isolated  results,  the  most  outstanding  being 
contained  in  the  oldest  paper  (1)  dedicated  to 
such  problems,  and  unjustly  forgotten  today,  due 
to  the  fact  that  those  interested  now  in  such 
studies  are  physicists,  while  this  paper  la 
published  in  a  journal  of  information  theory. 

He  mention  that  Chapter  2  of  the  present 
paper  is  a  continuation  and  development  of  ideas 
and  results  contained  in  (1]. 

(c)  It  is  of  theoretical  and  practical 
importance  to  evaluate  the  error  committed  in  the 
beat  approximi  tion  of  the  initial  random  field 
with  such  particular  random  fields  of  a  given 
class,  and  to  find  that  random  field  of  this  class 
which  produces  this  ninladration. 

This  paper  presents  the  solution  of  this 
problem  for  two-dimensional  rectangular  random 
fields  with  arbitrary  seta  of  states  in  each 
point. 

He  use  the  relative  entropy  as  a  measure  of 
discrepancy  and  we  determine  explicitly  for  each 
class  of  unilateral  Markov  fields  under  discussion 


(a)  the  Markov  field  which  is  the  best 
approximation 

(b)  the  expression  of  the  error  committed  in  this 
best  approximation,  which  is  a  functional 
depending  on  the  probability  measure  of  the 
given  random  field. 

We  remark  that  in  this  study  some  approxima¬ 
tion  operators  appear  which  are  nonlinear  projec¬ 
tion  operators  in  some  Banach  spaces. 


2.  The  Concept  of  a  Unilateral  Markov  Field. 

Let  o  be  an  array  of  points  (1 ,  j)  (l^i<_m, 

1  «.  j  <_  n)  in  a  plane  and  t  some  subset  of  It. 

Let  CjT  be  some  random  variable  attached  to  the 
point  (i,j),  taking  values  in  a  measurable  space 

(xiJ,slj)  ,  <i,j)  e  o  (2.1) 

A  random  field  (  is  an  array  of  arbitrarily  depen¬ 
dent  random  variables 

.  U.J)  «  o  (2.2) 

taking  values  in  the  amasureable  space 

(X,s)  -  X  (X. .,S..)  (2.3) 

(i.j>e  o  lj 

with  joint  probability  measure 

P{(T)  -  P(t€  T)  ,  T  €  S  (2.4) 

For  any  set  T  C  o  , 

CT  -  (Cjj  .  (i, J)  «  XT)  (2.5) 

is  a  random  vector  attached  to  the  set  t  and  taking 
values  in  the  measurable  space 

(x\sT>  -  X  <x..,s.,)  (2.6) 

(i,J)€i  *3  13 

with  probability 

P  t(TT)  -  P(It€tT)  ,  TT  e  ST  (2.7) 

C 

Obviously  P  is  a  marginal  of  P.  . 

C  C 

Considering  conditional  marginals  of  P.  ,  we 
denote  * 

P(*kt€Tk»l  Cij-*ij'  (i'J»  6  X) 

■  \EUij.  (i.H*  »,Tk*  *  “ij  '  l)  <2-8) 

for  any  subset  >  of  o,  such  that  i  does  not 
contain  the  point  (k,t) . 

» 
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Let  us  consider  that  a(k,l)  is  the  set  of  all 
points  (i,j)  in  the  rectangle  o,  such  that  l^k, 

)  i  with  the  exception  of  the  point  (k,l),  i.e. 

o(k,l)  -  {<i,j)i  i<k,  j<  t)  -  (k.l)  (2.9) 

In  what  follows  we  will  denote  by  w(k,t)  some 
subset  of  a(k,t),  and  by  8(k,l)  the  set  of  all 
points  (i,  j)  in  c  such  that  either  l  i  k  or  J  <  1 
or  both,  with  the  exception  of  the  point  (k,t), 
i.e. 

6<k,i)  -  {  (i«  j) ;  i<  k,  1«  j<n) 

U  { (1.  j)  f  1  <  i  <  in,  j  <  l)  -  (k,|)  (2.10) 

Definition.  The  random  field  C  is  unilateral  of 
class  u  if  for  each  (k,l)  €  o  and  any  Tkl  6 

PCklKij,(i,j,e  B(k,l)<TktKj'  (i'j,e  Bkt’ 

’  \tUij»(i*l)e“<k.t),Tkt'XiJ'  H,i)e  “<k,tn 

(2.11) 

LEMMA  1.  The  unilateral  random  field  (  of  class 
u  has  its  joint  probability  measure  defined  for 
any  T  €  S  by  the  expression 

VT  TT  pr  |r  <i,j)€u(k,l)(dxkll 

rn  n,  §\  G  m 


T  <k,l)€o 


|  xij(  (i,})6u(k,l)> 


.<s) 


(2.12) 


Let  u(k,l)  contain  the  points  A^  ( 

(1  s  <_  N)  i  let  us  consider  a  given  value  for 
s  (1  <  b  ^  Nl.  Given  an  arbitrary  point 
(k,t)  €  o,  let  us  denote  by 

9 [ (u,v) i  (k,t) i  s]  (2.13) 

the  set  of  points  (u,v)  6  o  such  that  u(u,v) 
contains  the  points 

llj_t<t»  (2.14) 


with 


■  PCjaUlj.  (i.jiewMkfi)  (TktKj'(i':1>e,‘,’(k,l>> 

(2.19) 

for  any  point  (k,t)  6  o  . 

(c)  the  set  u*  (k,t)  is  given  by  relation  (2.16). 

Let  r  be  a  natural  number.  He  denote  by 
Yr(k,t)  the  aet  of  pointa  (l,j)  in  o  such  that 

|k-  i|  <  r  ,  |  j  -  * |  <  r  (2.20) 

with  the  exception  of  (k,l),  i.e. 

Yr  (k,  t)  *  t(i,J)»  k-r^i^k+r,  l  -  r,*  l+r)  -  (k,l) 

(2.21) 

Definition.  The  random  field  C  is  an  r-Markov 
field,  if  for  any  point  (k.l)  €  o,  and  any  set 
Tkt  £  Skl  the  following  relation  takes  place, 

\tlCij'  0_  <k«t)  <Tkl^Xij'  (k*t>) 

"  PCklUij,(i,j)6Yr(k,t){Tkl|,tij'  Vk'l,) 

(2.22) 


tut  us  denote  now 

or<k,t>  «  Yr(k,l)  n  a(k,f)  (2.23) 

LEMMA  3.  For  any  unilateral  random  field  C  and 
any  point  (k,t)  €  o  the  relation 


u(k,l>  C  ar(k,t) 


implies  the  relation 


s’lk.l)  C  Yr(k,l> 


(2.24) 


(2.25) 


LEMMA  4.  A  unilateral  random  field  (  of  any  class 
u  is  a  Markov  random  field. 

EXAMPLES. 

1.  If 


*u!l  ■  ,k'*> 


(2.15) 


u(k,l)  •  or(k,l) 


(2.26) 


Let  us  denote 

N 

(i)'  (k, ()  -  u  6|(u,v)»  (k.l),  s)  -  (k,l)  (2.16) 
s-1 

LEMMA  2.  let  us  consider  that  (k,()  is  an  arbi¬ 
trary  point  in  o,  and.  Tk(  £  (k.l)  6  o.  Then 

for  each  set 

w(k,t)  C  a(k, t)  (2.17) 

there  exists  another  set  u'(k,t)  C  o  such  that 

(a)  u(k,t)  C  u»(k,t)  (2.16) 

(b)  for  any  unilateral  random  field  of  class  u, 
the  following  relation  takes  place, 

ptkt|clj,a,j)e o-  <k,n<TkiKj'  li,j)e°'  (k«tn 


then 


w’  (k.l)  •  Yr(k,l) 


(2.27) 


so  that  the  corresponding  random  field  is  an 
r-Markov  random  field. 

In  particular,  let  r  »  1  and  (2.28) 

u(k,f )  -  ( <k-l, t) , (k-1,1-1) , (k,l-l) )  (2.29) 

Then 

w'(k.t)  ■  { (k-pj  ,l-Pj)  j  Pj.Pj  e(-k,0,*l))  -  (k.l) 


Yj (k,i) 


(2.30) 


If 


u(k,C)  -  ( (k-1,  t) , (k, l-l )  )  (2.31) 

it  follows  that 
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u'(k.l)«  {  (K-Pj ,  t-(<2) ;  Pi»P2e  (-1,0, +1),  Pj^P2> 

(2.32) 


3.  The  u>- amount  of  Information  Determined  by  » 
Unilateral  Ban don  Field. 


Given  an  arbitrary  random  field  C  with  joint 
probability  measure  P^,  we  can  obviously  calculate 
its  marginal  conditional  probability  measure 


In  what  follows  we  will  consider  the  expres¬ 
sion 


h(P?tP 


n(w)  ' 


(3.8) 


representing  the  relative  entropy  of  P.  with 

respect  to  P  ,  .  .  ' 

ntw) 

THEOREM  2  (of  best  approximation).  Let  (  be  an 

arbitrary  random  field.  If  1  (£)  is  finite,  then 

<1> 


PektUij,(i,j)6u(lc,i)  (TktKj'  u(k,i)) 


(3.1) 


minfhfP^iP^^j)  i  n(u)  6  L(w) )  -  1^(0 
This  minimum  is  reached  iff 


(3.9) 


for  any  given  set  u(k,t)  and  any  point  (k,i)  £  o. 


Ph(w)  "  PC(w> 


(3.10) 


Consequently,  for  each  given  random  field  (, 
we  can  define  a  new  random  field  C(u)  with  joint 
probability  measure  given  by  (2.12),  with  marginal 
conditional  probability  measures  given  by  (3.1). 

In  what  follows  we  will  make  essential  use  of 
the  concept  of  relative  entropy.  Given  two  prob¬ 
ability  measures  P  ,  P2  over  some  measurable  space 
(X,S)  the  quantity  MP^iP^)  which  takes  the  value 


i.e.  iff 

P,1ki I ni j ’  (i, 3)  e  “(k,  1)  “  %lUij,(i,j>ew(k,l) 

(3.11) 

for  all  points  (k,t)  £  o . 

LEMMA  5.  If  for  all  points  (k,t)  £  o. 


r  P, (dx) 

J  pi(dx)  log  FTTSo  (3*2) 

X  2 

if  P^  is  absolutely  continuous  with  respect  to  P2 
and  A+»  in  the  contrary  case,  is  the  relative2 
entropy  of  P^  with  respect  to  P^. 

Definition.  The  quantity 

I  <C)  -  h(t,U«))  (3.3) 

w 

is  the  e-amount  of  information  determined  by  the 
arbitrary  random  field  t. 


u(k, i)  C  w(k,l)  (3.12) 

then,  for  any  random  field 

1.(0  <  I  (O  .  (3.13) 

u  “ 

4.  Two  Measures  of  Discrepancy. 

In  order  to  evaluate  the  discrepancy  between 
the  given  random  field  £  E  L  and  the  approximating 
random  field  £(u)  £  L(u)  we  may  also  use  the 
distance  in  variation  between  their  corresponding 
probability  measures. 


THEOREM  1. 

with  equality  iff 


(3.4) 


PC~PUw>1 


(4.1) 


where  the  norm  is  the  total  variation  of  the  signed 
measure 


P 


C 


Pt(w) 


(3.5) 


P  -  P 
C  y(.M 


(4.2). 


i.e.  iff  relation  (2.11)  holds  for  any  (k,t)£o. 

Let  us  denote  by  L  the  totality  of  random 
fields  defined  over  (X,S)  and  by  L(w)  its  subset 
containing  the  totality  of  random  fields  of  class 
u. 


The  relationship  between  (4.1)  and  (3.3)  is  given 
by  the  following 

LEfWA  6. 

*  pt  -  pc(U)H 2  - 2  ,4-3> 


Let  us  consider  an  arbitrary  random  field 
n(w)  £  L(w)  defined  with  the  help  of  the  condi¬ 
tional  probability  measures 


<i,j>£w(k,f>(Tktlxij' 


(3.6) 


5.  The  Projection  Operator. 

Let  us  denote  by  Z(w)  the  mapping  of  L  into 
L(u)  such  that 

Z  («u)  C  -  c  (w)  ,  c  e  L  ,  t(w)£L(ui)  (5.1) 


so  that  its  joint  probability  measure  P  ,  . 
given  by  the  expression  n  u 

Pn(w)(T)  |  (k  ,TvTo \f 1  -j'  <i,  j)e-)<k ,  £) 

|x.jt  (i ,  j)  £  u)(k,f ) ) 


is 


(T, 


kr 1 


(3.7) 


for  u  C  o. 

Considering  L  imbedded  in  the  Panach  space  of 
signed  measures  over  (X,S),  with  norm  the  total 
variation,  it  follows  that  the  norm  of  any  prob¬ 
ability  m<  asim  is  one,  so  that  in  particular 

I!  'r  I'  *  II  ruU)H  ’  1  •  “  c  0 
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Consequently,  the  operator  Z(u)  defined  over 
L  has  norm  one.  It  is  easy  to  see  that  it  is  a 
nonlinear  idempotent  operator  and  its  restriction 
on  the  set  L(a>)  is  the  identical  operator,  so  that 
Z (u)  is  the  projection  operator  on  L(w).  Por 
these  reasons  it  makes  sense  to  introduce  the 
following 

Definition. 

(a)  Z(u)  is  the  dees  u  projection  operator, 

(b)  For  a  given  random  field  t  €  L,  the  element 

£(u)  «  Z(w)E  6  L(w) 


exist  and  are  finite,  this  equation  holds. 

From  Theorem  1  and  Lemma  7  there  follows  the 

LEMMA  6.  For  any  random  field  C,  and  any  class  u, 

h(C>  1  l  h(Ck,|CM,  <i,j>  e  U(k,t)) 
/v.ne  n 

(6.7) 


(k,l)  €  c 

with  equality  iff 


v<> 


(6.8) 


i.e.  £  is  a  random  field  of  class  u. 


is  its  class  u  projection,  i.e.,  its  projection 
on  L(u). 

(c)  In  particular  if 

u(k,l)  -  Yr(k,t)  ,  (k,i)  6  o  (5.4) 

from  (a),  (b)  we  obtain  the  definitions  of  the 
r-Markov  projection  operator  and  of  the  r-Markov 
projection  of  a  random  field. 

So  the  result  (3.10)  in  Theorem  2  does 
express  the  fact  that  the  minimum  in  (3.9)  is 

reach<  iff  p  ,  ,  is  the  projection  of  (  E  L  on 

,  h(u> 


6.  On  the  Expression  of  1^(0. 

Let  us  suppose  that 

(a)  the  probability  measure  P.  admits  a  prob¬ 
ability  density 

p^(x)  ,  x  e  x  (6.1) 

(b)  the  probability  measure  P  .  .  admits  a  prob¬ 
ability  density  U 

PE(u) <x)  '  *  6  x  <6'2) 

(c)  the  probability  measure  (2.11)  admits  i, 
probability  density 

PtkJeij'<1'J,e"(k,t)  (xktKj'U'j,e"(k't))' 

(k,t)  €  o  (6.3) 


Let  us  denote 

(a)  h(C)  *  -  J  r  (dx)  »>(*>  (6.4) 

X 

the  (differential  entropy  of  the  random  field  t 


(b) 


h(C.  ,  |  C...  (i,  j) c  w(k,i) ) 

K  9  X  J  J 


■I 


P  w(k,l)(dxWlk,l))  hUv.l 
,u(k  ,1)  5  ' 


kl1  ij  ' 


(i,j)  e  w(k,l)} 


(6.5) 


the  conditional  (differential)  entropy  of  the 
random  variable  C,  ,  with  respect  to  the  random 
vector  (i,j)’t  u(k,t)}. 

LEMMA  7.  If  all  quantities  in  the  relation 

I  (C)  -  I  h(CMlii,,(i.j>e<.(h,I)>-h(0 

U  (k.DGo  K£  13  (6.6) 
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ABSTRACT .  We  study  interpolation  by  bivariate  quadratic  splines  on  a  criss-cross 
triangulated  non-uniform  rectangular  grid.  The  data  points  are  at  the  corners  of  the 
rectangles.  We  develop  a  quasi-interpolation  formula  giving  optimal  order  of  approxi¬ 
mation  O(h^)  and  provide  an  interpolation  scheme  whose  order  of  approximation  is 
O(h^) .  The  results  apply  to  functions  in  C^  and  for  bounded  mesh  ratios. 


I .  INTRODUCTION .  Given  data  (f(x^,y^)},  iBl,..,m;  J«l,..,n  representing  function 

values  at  the  nodes  {(x  ,y,)}  of  a  non-uniform  rectangular  grid,  we  wish  to  produce 
1  i  j 

a  C  function  interpolating  the  data  and  approximating  the  function  f  on  the  rectangle 

x  t  y  i  •  y__  1  •  where  we  assume  x.<x,  ...<x  and  y.<y0  ...<y  .  The  class  of  functions 
x  in  in  14ml  2  ^  n 

from  which  the  interpolant  will  be  chosen  is  the  space  of  C  bivariate  quadratic  splines 


developed  by  Chui  and  Wang  [CW].  These  splines  are  the  C  functions  which  are  quadratic 
polynomials  on  each  triangle  of  a  criss-cross  triangulated  rectangular  grid.  Our  data 


*2 


assi 

med  to  be  al 
_ 1 

t  the  n 

l _ 1 

odes  of  this  . 
1 _ 

^ectangul 

ar  grid,  as  show  in  Figure  1. 

^1 

X 

ea 

* 

03 

sst 

i 

— 

- 1 

tl  X 

L  J 

1 - 1 

A  X 

tf 

the  centers  of  the  rectangles  of  the  grid  defining  the  spline  space,  it  is  easy  to 
construct  examples  for  which  this  is  not  possible,  e.g. 


Xj--3, 


x2”-2,  x3“2. 


•3. 


The  more  standard  choice  of  interpolating  space  employs  quadratic  tensor  splines. 
One  drawback  of  this  choice  is  the  the  high  degree  (four)  results  in  poor  shape 
preserving  properties.  While  we  do  not  investigate  here  the  shape  preserving  properties 
of  our  interpolants,  their  lower  degree  would  seem  to  warrant  the  development  of  basic 
results  concerning  interpolation  and  approximation. 

The  main  results  of  this  paper  are  as  follows: 


a)  We  provide  a  simple  quasi-interpolation  formula  which  (by  definition)  reproduces 
quadratic  polynomials  from  their  values  at  the  data  points,  and  which,  when  applied 
to  data  from  arbitrary  functions  f,  produces  a  spline  having  optimal  order  of 
approximation  O(h^)  where  h  is  the  maximum  grid  spacing. 


b)  We  develop  an  explicit  interpolation  formula  which  produces  an  interpolant 
with  approximation  order  O(h^)  provided  f  is  in  and  the  global  mesh  ratio  is  bounded 
as  h  -*■  0.  The  approximation  order  0(h?)  for  Interpolation  is  optimal  sinca  it  is 
known  for  one  dimensional  quadratic  splines. _ 
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II.  RESULTS.  As  developed  in  [CW],  the  spline  space  is  spanned  by  the  set 
of  B-splines  B^j(x,y)  where  the  octagonal  support  of  B44  and  its  function  values  at 

the  data  points  are  shown  below  in  Figure  2. 


ij 


A^,A|,Bj  ,Bj  are  defined  in  terms 
of  mesh  ratios  as  follows: 


v  Xi~Xl-l 
i  xi+1-x1_1 


i  yj+ryj-i 


a! 


xi+i~xi 


1  xi+i-xi-i 


J  yJ+l'yj-l 


A  spline  function  takes  the  form  £  where  the  coefficients  3^ 


are  to  be  determined.  The  interpolation  equations  are  easily  obtained  as 

L<6,u  1  <AiV  v  (Aiy  +  ei,j-,  *  <*;y  -  y  a) 

where  f^  i  f^xi‘yj)  and  i-l,..,iu;  j»l,..,n. 

We  proceed  to  construct  a  quasi-interpolation  formula.  This  is  by  definition  a 
linear  mapping  from  the  data  into  the  spline  space  which  reproduces  quadratic  poly¬ 
nomials  from  their  data  and  such  that  each  coefficient  Bjj  of  the  spline  function 
depends  locally  on  data  near  (x  ,y.).  The  formula  is  developed  by  a  successive 
guessing  approach.  J 

The  B-spline  B^(x,y)  has,  as  the  corners  of  the  central  rectangle  in  its  support, 
the  data  points  (x^),  (x^.y^),  (x^y^),  (*1+1  .y.j+1)  •  It  is  perhaps  then 
reasonable  to  guess  that  its  corresponding  coefficient  3^  can  be  conveniently 
expressed  in  the  form 

B  -  fU  *  fi+l,J  +  fi,j+l  *  f i+1,.1+1 
Pij  4  +  eij 

where  e^  represents  an  error  term.  Plugging  the  formula  (2)  into  the  interpolation 

equations  (1)  gives  an  equation  for  the  e 


(2) 


'ij 


L(e) 


ij 


WVV  (D 

2  nlj 


(yryt-i)(yi+rV  (2) 

2  nij 


(3) 


where  and  are  combinations  of  second  divided  differences  of  the  data  in 

the  x  and  y  directions  respectively: 

U)  _  “J  r  /  r_  ....  ,  ..  „  .  3 


n 


ij 

(2) 


B1 

4  f^xi-l»xi’xi+l^'yj-l)  +  4  f(^xi-l,xi,xi+l 


l»yj)  +  4  f([xi-l,xi’xi+l 


Ai 


t.Iy 


i-r  lJ,j-i,yj,yj+i 


])  +|f(*r[y 


y4.y4JJ>  +  it  f (* 


j-i,7j,7j+i 


i+l 


',yj-i’yj’yj« 


1'yj+i) 


(4) 

]) 
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We  note  that  both  and  are  constants  if  f  is  a  quadratic.  If  this  is  the 

case,  it  is  a  simple  matter  of  substitution  to  show  that  (3)  is  satisfied  if 

1  2  (1)  1  2  (2) 

m  ~2  ^xi+l-xi^  "  ~2  11  ij  *  We  can  now  write  our  quasi-interpolation 

formula. 


Theorem  1:  If  f(x,y)  is  a  quadratic  polynomial  and  the  coefficients  B  are  given  by 


u 


.  1(  ,2  (1)  _  1,  ,2  (2) 

4  2^  i+1  V  nij  2vyj+l  yj'  'ij  qv  'ij 


where  the  rectangular  grid  with  nodes  {(x^,y^)}  covers  the  plane,  then 
Q(f ) (x,y)  =  ^q(f)1jBlj(x,y) 
is  equal  to  f(x,y)  for  all  (x,y). 

Proof:  The  previous  arguments  have  shown  that  if  B 


(5) 


(6) 


ij 


q(f)1j  then 


^ijBij(x,y) 


interpolates  f(x,y)  at  the  nodes  of  the  grid.  It  remains  to  show  only  that  Q(f)(x,y) 
is  in  fact  a  quadratic  polynomial,  in  which  case  it  must  coincide  with  f(x,y). 
Unfortunately ,  this  task  must  apparently  be  carried  out  by  brute  force  calculation. 
One  approach  is  to  calculate  the  second  derivatives  of  B  (x,y)  (which  are  constant) 
and  then  successively  choosing  f(x,y)  *  l,x,y,x2,y  ,xy ,tflJshow  that  the  second 
derivatives  of  Q(f)  are  constant.  Of  course  various  symmetries  will  reduce  the  amount 
of  actual  calculation.  We  omit  the  details  here. 


Henceforth,  Q(f)  as  defined  by  (6)  (via  (4)  and  (5))  will  be  referred  to  as 
the  quasi-interpolant  of  f.  Standard  arguments  show  that  if  f  is  in  on  the 
closed  rectangle  [x^»xml  x  [y^»ynl  then 

|f(x,y)  -  Q(f)(x,y)|  -  0(h3) 

for  (x,y)  in  the  interior  of  the  rectangle. 


(7) 


We  now  develop  our  interpolation  scheme  and  investigate  its  order  of  approximation. 
The  interpolation  scheme  takes  the  form 

Bij  "  q(f)ij  +  Eij  (8) 

where  satisfies 

L(eij)  -  fjj  -  L(q(f))1j  =  g^  .  (9) 

Since  g  depends  locally  and  linearly  and  is  identically  zero  for  quadratic  data, 

it  follows  from  a  standard  Taylor  polynomial  approximation  argument  that  if  f  e  CJ 

3 

on  the  rectangle  defined  by  the  grid  points,  thag,,  ■  0(h  )  in  the  interior  of  the 

ij  .  3 

rectangle.  If  we  could  show  that  e  satisfying  (9)  was  also  0(h  )  then  the 

ij  3 

interpolant  would  differ  from  the  quasi-interpolant  by  0(h  )  and  so  provide  the  same 

3 

order  of  approximation.  Unfortunately,  -  0(h  )  is  not  necessarily  true. 
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It  is  possible  to  write  down  an  explicit  solution  for  e  ,  since  there  is  a 


rs 


ij 


simple  formula  for  the  fundamental  function  B^“  satisfying 


namely 


«*«"> 


rs 


5, 

ir  js 


(10) 


ij 


(,rrt',r-l)<WVl> 


‘Wr*  <  VVl>  <V.-H",s) <  Wl>  >0U 


rs 


(ID 


rs 

where  o  as  a  function  of  i  and  j  for  a  given  r  and  s,  is  +1  according  to  the 
following  pattern  in  Figure  3: 


+  -  +  __  +  _  +  _  + 

-+-++-+-+- 

+-+--+-+-+  The  circled  +  is  situated  at  the  (r,s)  position. 

-  +  -  +©  -  +  -  +  - 
-  +  -  +  +  -  +  -  +  - 
+  -  + - +  -  +  -+  Fig.  3 


We  can  then  write 


ij 


I  § 


rs 


r,s 


U 


'rs 


rs 


(12) 


where,  for  fixed  i  and  j,  B  has  the  same  sign  pattern  as  in  Figure  3,  with  (i,j)  and 

■*»  |»g 

(r,s)  interchanged.  If  the  mesh  ratios  are  bounded  then  B^  *  0(1)  and 

e  ■  0(mn)|| g  ||  -  0(h).  It  is  in  fact  possible  that  e  has  the  same  order  of 
ij  rs  ij 

magnitude  as  h,  for  the  g  may  alternate  in  sign  due  to  oscillations  in  the  mesh 

rs 

spacing.  While  this  would  suggest  that  |f(x,y)  -  . .B, , (x,y) |  ■  0(h)  is  best 

2  iJ  IJ 

possible  for  the  interpolant,  in  fact  0(h  )  can  be  obtained  for  it  turns  out  that 
£  e^B^(x,y)  ■  0(h  )  even  if  the  as  given  by  (12)  are  0(h).  We  have 


£  EliBU<X-y)  -  £  J.  -  J,  (  £  BiJ"Bli<X-y)> 


(13) 


Next  we  use  the  linear  dependence  of  the  as  shown  in  [ CW ] : 

I  (-l)14J(»1+1-lc1)(yJ+1-jrj)BiJ{x>y)  5  0. 

1»  J 

It  is  not  difficult  to  see  now  that  £  B^jSB^(x,y)  by  v^rtue  (H)  and  (14) 


(14) 


i.j 


is ,  for  a  fixed  (x,y),  zero  outside  of  a  cross-shaped  region  centered  at  (x,y)  whose 
arms  are  only  a  bounded  number  of  nodes  in  width  and  which  consequently  contains 
0(m+n)  nodes.  It  follows  that  l  g  (  l  B.^B. .(x,y))  has  only  0(nrt-n)  non-zero 

T8  44  lj  lj 


r.s 


l,j 


terms  so  that  £  e  B  (x,y)  -  0(m+n)  g  -  0(hZ)  using  (13) 
i.j  iJ  tS 
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The  approximation  order  of  the  interpolant  can  then  be  estimated  as 


U(x,y)  -  Z  0±j  Cx.y)  |  <|f(x,y)  -  j*  q(f  )  (x,y)  |  +  HEij  By  <x,y>  I 

-  0(h3)  +  0(h2)  -  0(h2). 

The  above  discussion  requires  some  formalizing,  especially  as  regards  the  lack 
of  data  beyond  the  boundary  of  lx^»xml  x  [ y ^ * yn ]  •  The  precise  statement  of  the 
approximation  result  is  given  in  the  following  Theorem: 


Theorem  2:  Suppose  the  following  hold: 

3 

a)  f  e  C  (ft),  where  ft  is  an  open  set  containing  the  rectangle  E  -  [a,c]  x  [b,d]. 

b)  (x^y^),  i-l,..,m;  j-l,..,n  is  any  set  of  grid  nodes  such  that 

1)  Cx1.yJ)  e  E 

ii)  max  - — - 1,  | - - - 1)  <  M  ,  where  M  is  a  constant  independent  of 

|Xi+l~Xi'  ,yj+l~yjl 

m  and  n  and  h  -  max  { lx1+j-x1 1 .  I*  • 

c)  The  augmented  data  set  {(x^y^ , f ^ )}  is  defined  as  follows: 

i)  fy  -  f  <Xi* y j  >  * 

ii)  xi+1“x1  "  h  for  i  ^  m  and  i  <_  0 
yj+1-yj  •  h  for  j  ^  n  and  j  £  0 

f^j  is  defined  for  all  (i,j)  i  (l,..,m)  x  {l,..,n} 

using  a  local  extrapolation  formula  exact  for  quadratics. 

d)  The  coefficients  3^  of  the  spline  function  (x,y) ,  i-0,..,m+l; 

j*0,..,n+l  are  chosen  as  follows: 

Bij  "  q(f)ij  +  Eij 

where  q(f)  is  defined  in  (5)  (via  (4))  and  is  defined  by 


s*l, . . ,n 


where  g  is  defined  in  (9)  and  B,,8  is  defined  in  (11). 

,  rs  ij 

Then  \  0^^  B±J  ^x»  Interpolates  f(x,y)  on  the  setUx^.y^  )},  1*1,.., m;  j-1, 

2 

and  |f(x,y)  -  (x,y)  |  -  0(h  )  uniformly  for  (x,y)  e  lx^*xmI  x  ly^.y^* 
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Proof:  The  extrapolated  values  of  f  .  differ  from  the  actual  values  f(x  ,y  ) 

3  ^  J 

by  0(h)  for  (x. ,y.)  within  0(h)  of  the  rectangle  [x.  ,x  )  x  [y,  ,y  ].  If  follows 
*  ■*  l  m  l  " 


that  if  q(f) 


ij 


iwj4  '  l'  m'  "  l"n" 

is  calculated  from  the  augmented  data  set,  we  still  have 


|f(x,y)  -  q(f)  B  (x,y)|  -  0(h  )  for  (x,y)  in  the  rectangle,  uniformly.  Similarly, 
ij  ^ 

when  calculated  using  the  augmented  data  set,  g^  ■  0(h)  for  (i,j)  in  the  original 
data  set.  The  sum  in  (11)  is  restricted  to  (r,s)  in  the  original  data  set  since  this 
insures  interpolation  on  the  original  data  points  and  minimizes  the  number  of  terms 
in  the  sum.  The  rest  of  the  proof  follows  from  previous  arguments. 


We  remark  that  there  is  no  essential  requirement  that  the  data  be  given  on  an 
entire  rectangular  grid  {x^}^  x  ®ur  <lua8i“interP0lant  and  interpolation 

scheme  apply  just  as  well  if  say,  we  wish  to  interpolate  at  the  nodes  of 

a  rectangular  grid  lying  inside  a  prespecified  open  region. 
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ON  THE  C2  CONTINUITY  OF  PIECEWISE  CUBIC  HERMITE  POLYNOMIALS 

WITH  UNEQUAL  INTERVALS 


C.  N.  Shen 

U.S.  Army  Armament,  Munitions,  and  Chemical  Command 
Armament  Research  and  Development  Center 
Close  Combat  Armaments  Center 
Benet  Weapons  Laboratory 
Watervl iet ,  NY  12189-4050 


ABSTRACT.  Cubic  hermite  polynomials  are  usually  C2  continuous.  With  the 
introduction  of  smoothing  within  the  intervals,  the  second  derivatives  can  be 
made  continuous.  This  may  be  applied  to  the  autonomous  vehicle  problem  with 
unequal  laser  scanning. 

In  using  a  laser  range  finder  to  measure  the  range,  the  direction  of  these 
laser  rays  can  be  subjected  to  angular  errors.  These  errors,  in  the  direction 
of  the  elevation  angle,  affect  the  determination  of  in-path  slopes  for  naviga¬ 
tion  of  autonomous  vehicles.  Nonuniform  grid  may  be  employed  in  computation  b^ 
the  spline  function  method  with  cubic  hermite  polynomials.  For  the  purpose  of 
smoothing,  it  is  essential  to  obtain  continuous  second  derivatives  at  the  grid 
point  from  both  sides. 

I.  INTRODUCTION.  The  smoothing  of  gradients  can  be  obtained  by  using  an 
optimization  method  for  approximation  involving  spline  functions.  Nonuniform 
grid  may  be  employed  in  computation  by  the  spline  function  method  with  cubic 
hermite  polynomials.  Continuous  second  derivatives  at  the  grid  point  from  both 
sides  are  essential  for  the  purpose  of  smoothing.  This  method  can  be  applied  to 
solve  the  following  problems:  Whether  the  platform  can  climb  on  the  estimated 
in-path  slope  or  whether  it  will  tip  over  the  estimated  cross-path  slope. 

II.  RECURSIVE  FILTERING  AND  SMOOTHING  PROCEDURE.  A  spline  function  s({) 

is  a  solution  to  the  optimization  problem 

N  N 

J*  =  Min.  {  F  [hi/JO-m^Ri  [h^il-m^]  ♦  p  E  /  [h]2d*}  (1) 

h  e  C2  i =1  i=2  Pi-1 

where  for  clarity  and  simplicity  in  discussion  we  only  consider  the  cubic  spline 
case.  Higher  order  polynomial  spline  can  also  be  treated  in  a  similar  manner 
with  more  complicated  computation. 

A  cubic  spline,  s,  is  a  piecewise  polynomial  of  class  C2  which  has  many 
good  properties,  such  as  the  minimum  norm  property  and 'local  base  property 
[1,2].  From  the  approximation  theory  we  know  that  for  each  set  A  *  {aj,...,a|4, 
a'l,  a'^},  there  exists  a  unique  cubic  spline  s({,A)  such  that 

s(8j  ;  A )  =  a  -j  ,  i  =  1,2,. ..,N  (2) 

s(0i;A)  =  a'i,  i  =  1 , N  (3) 
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where  s  is  the  first  derivative  of  the  function  s.  The  above  equations  can  be 
thought  of  as  boundary  conditions  for  the  piecewise  cubic  spline  interpolation 
given  a  set  of  data  (j^,a.j),  for  i  «  1,2,...,N.  Thus,  solving  the  problem  in 
Eq.  (1)  is  equivalent  to  determining  a  set  of  constraints  A  for  the  optimization 
problem: 


N  N  Pi 

J*  «  Min  {  E  [s(^;A)-mi]TRi  [s(^;A)-mO  ♦  P  E  /  [s(f;A)]*d?  (4) 

A  i =1  i«2  Pi-1 

Instead  of  taking  a  direct  approach  to  find  an  optimal  set  of  constraints  for 
the  problem  above,  it  is  proposed  to  further  transform  this  problem  into  a  form 
which  is  convenient  to  be  solved.  From  the  theory  of  numerical  analysis  [3],  it 
is  well  known  that  a  piecewise  cubic  Hermite  polynomial  p(()  is  in  the  family  of 
C1.  For  each  set  B  *  AuAc,  where  Ac  is  a  complement  of  A,  i.e.,  Ac  ■  {a',,  i  * 
2,3, . . .  ,N-1 } ,  then  B  *  {a-j.a’-j,  i*l ,2, . . . ,N} ,  there  exists  a  unique  piecewise 
cubic  Hermite  polynomial  p((;A)  such  that 

p($i;B)  ■  a,  ,  i  «  1,2, ...,N  (5) 

p(0i;B)  *  a'i  ,  i  .  2, . . . ,N  (6) 

where  p  is  the  first  derivative  of  p. 

It  should  also  be  noted  that  for  each  set  A,  there  are  an  infinite  number 
of  piecewise  Hermite  polynomials  p({;A)  such  that 

P(/»i ;  A)  «  a-;  ,  i  *  1,2,  ...,N  (7) 

p(Pi ; A)  »  a’i  .  i  -  l.N  (8) 

Let  a  set  of  p({;A)  which  satisfies  the  constraints  in  the  equations  above  be  P, 
i.e., 

P  «  |pU;A):(5), (6)  satisfied}  (9) 

With  reference  to  the  paper  by  de  Boor  [4],  it  is  noted  that  there  exists  a 
unique  cubic  spline  s($;A)  in  the  set  P.  Also  from  the  minimum  norm  property  of 
a  cubic  spline  we  have  the  following  relation 

N  Bi  ..  N  fii 

E  /  [s(*;A)]*d*  <  E  /  [pU;A)]*d*  (9) 

i*2  Pi-1  i*2  Pi-1 

That  is 

N  15,  .. 

E  /  [s(*;A)J«d*  *  inf  Jp(p)  (10) 

i=2  Pi-1  p  c  P 

where 

N/i 

JP  -  E  /  [pU;A)]d*  (11) 

i«2  Pi-1 
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Since  a  cubic  spline  s(<J,A)  is  unique,  a  piecewise  cubic  Hermite  polynomial 
p(£;A)  which  minimizes  the  smoothing  integral  Jp  in  the  above  equation  with 
respect  to  Ac  becomes  a  cubic  spline  s(£;A).  To  be  more  precise,  we  have  the 
following  theorem. 

THEOREM:  Let  P  represent  a  set  of  piecewise  cubic  Hermite  polynomial  p 
which  satisfies  the  constraints  below 

P(0,;AC)  =  a,  ,  i  =  1,2, ...N  (12) 

P  ( /5  -j ;  Ac )  =  a',  ,  i  =  1,N  (13) 

where  p  e  C1,  A,  and  Ac  are  the  same  as  mentioned  before.  Then  there  exists  a 
unique  cubic  spline  s(£)  such  that 

N  0-j  N 

T  /  [S(«)]*d(  =  Min  Z  f  [p(?,AC)]=d?  (14) 

i*2  0i-l  AC  i=2  0i-l 

where  s  and  p  are  the  second  derivatives  of  functions  s  and  p  and  s  e  C2 .  A 
simple  example  with  N  *  3  is  given  next. 

Ill . _ EXAMPLE  FOR  C*  CONTINUITY.  For  convenience  and  simplicity,  we  only 

consider  a  special  case  with  N  =  3.  The  node  points  are  given  as  /3lf  02,  and 
03.  The  intervals  are  not  equal,  i.e., 

(02-01)  *  (03-02)  H5) 

Let  a  set  of  piecewise  cubic  Hermite  polynomial  p(t)  be 

P  =  [p( t ; Ac )  ,  p  e  C’  [ta,t3J,  p(t2)  =  a,  a  c  A'-]  (16) 

which  satisfies  the  constraints  in  the  equations  below 

p(ti  ;AC)  =  a,  ,  for  i  =  1,2,3 
P(t  i  ;AC)  «  a‘  ,•  ,  for  i  =  1,3  (17) 

In  this  special  case,  a  set  Ac  =  a'2  =  a. 

We  want  to  show  here  that  the  cubic  Hermite  polynomial  p(t;Ac)  which  is 
obtained  by  minimizing  the  smoothing  integral  will  become  a  cubic  spline  func¬ 
tion  s ( t )  e  C*[t1,t3] 

t2“  ••  t3 

J*  =  Min  {/  [p( t ; A* ) ] 2dt  +  /  [p(t;AC)]2dt} 

AC  t!  t2+ 


t2”  t3 

*  Min  {/  [p(t;a)pdt  +  /  ^  [p(t;a)  ]£dtj  (18) 

a  t!  t2+ 
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From  Eq.  (A14)  of  the  Appendix,  the  smoothing  integral  above  can  be  written  as 

J(a)  =  (xj-AjXjl^B^  (X2-AjXj)  +  (X3~A2X2)TB2  U3-A2X2)  (19) 

—  1 

where  An* ,  ,  and  x-j  are  defined  in  the  Appendix,  and 

*i  *  ( a i , a ' i ) T  ,  with  a'2  =  a  ,  i  =  1,2,3  (20) 

Ai-1  *  di-l  *  ti"ti-l  (21) 

Using  Eqs.  (All)  and  (A12),  the  functional  J(a)  is  written  as 


J(a)  =  12dj  (a2-a\-dia' \):  -  12dj  (a2-ai-dia' j ) (a-a 1 j ) 

- 1  -  3 

+  4dj  (a-a'j)*  ♦  12d2  (a3-a2-d2a)8 

-12d2  (a3-a2-d2a) (a'3-a)  ♦  4d2  (a'3-a)z  (22) 

Taking  the  partial  derivative  with  respect  to  a  yields 

3  J  -  a  - 1 

--  =  -12d|  (a2"al_dla'l )  +  8di  (a-a'j) 

3a 

~3  "3 

+  24d2  (a3-a2-d2a) (-d2 )  ■  12d2  (-d2)(a'3~a) 

- 1 2d2  (-l)(a3-a2-d2a)  -  8d2’  (a’3-a)  =  0  (23) 
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i 


(24) 


Solving  the  equation  above  for  a,  one  obtains 

a*  *  [3dj  (a2~ai)-di  a'i+3d2  a3~d2  3a2_d2  a'3]/[2(dj  +d2  )] 

To  show  that  p(t;a*)e  C*[tj,t3],  we  only  need  to  show  that 


lim  p(t;a*)  =  lim  p(t,-a*)  (25) 

t*t2_  t-t2+ 

That  is,  for  piecewise  cubic  Hermite  polynomial  p(t), 

Pl,2(t2;a*>  *  P2,3(t2Ja*)  (26) 

where  pj2  is  the  cubic  Hermite  polynomial  within  the  interval  0j  and  02,  anc*  P21 
is  the  cubic  Hermite  polynomial  within  the  interval  02  and  03 . 

Now  from  the  definition  of  piecewise  cubic  Hermite  polynomial  in  the 
Appendix,  we  have 

Pl,2(t2:a*)  *  6diS{ai-a2)  +  2dj  a’j  ♦  a*  (27) 


By  using  Eq.  (24),  the  above  equation  can  be  expressed  as 

Pl,2(t2**a*)  *  [-6a2(di  +d2  )  +  6(ajdi  +a3d2  +  2(a'i-a'3)]/(di+d2)  (28) 

In  a  like  manner,  omitting  the  detailed  derivation,  we  obtain  easily 

P2,3(t2Ja*)  c  [”6a2(dl  +d2  )  +  6(aldl  +a3d2  +  2(®' ra,3)']/(di+d2)  (29) 

Thus,  Eq.  (26)  is  always  true,  that  is,  the  conclusion  in  the  Theorem  is  valid. 
It  is  proved  that  the  Ce  continuity  exists  in  the  optimization  procedure  for 
piecewise  cubic  Hermite  polynomials  with  unequal  intervals. 

IV.  CONCLUSION.  For  scanning  in  the  direction  of  elevation  angle  from  the 
top  of  a  mast  where  a  laser  is  located,  the  intervals  needed  in  angles  are  small 
for  far  away  targets,  while  the  same  are  large  for  close-by  objects.  The 
smoothing  algorithm  discussed  in  this  paper  indicates  that  cubic  Hermite  polyno¬ 
mials  can  be  used  for  unequal  intervals  or  nonuniform  grids. 
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APPENDIX 


EVALUATION  OF  THE  SMOOTHING  INTEGRAL 

A  piecewise  cubic  Hermite  polynomial  in  the  interval  is  repre¬ 

sented  in  terms  of  the  basis  functions  and  the  state  vectors  x, ,  x^,  where  the 
state  vectors  are  defined  as  in  Eq.  (20).  By  changing  the  independent  variable 
below, 

*  =  S-0i-l  (Al) 

Then  the  smoothing  integral  in  the  interval  [0-j-i ,0^ ]  becomes 

*i-l  • 

1  i -1 ,  i  *  fQ  [Pi-i,i(t)Pdt  (A2) 


where  =  t^-t^—j  =  0i-i»  • 


With  the  change  of  the  variable  above,  the  second  derivative  of  the  Hermite 
polynomial  can  be  written  as 


Pi-1, i(t) 


_  T 

♦i,i(t) 
*i,l<0 
«t>i  ,0(t) 
^i,o(t) 


(A3) 


(A4) 


where  the  second  derivatives  of  the  basis  functions  can  be  derived  as  follows. 
Using  the  change  of  variables,  we  rewrite  the  basis  functions  as 

4>i,l(t)  =  tM3Ai-1-2t)/A<-1’ 

,  1  (t)  *  tMt-Ai-jJ/Ai-!* 

=  (Ai-1-t)MAi-i+2t)/Ai-1» 

*if0(t)  =  t(Ai_1-t)*Ai_r'  (A5) 
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Then,  taking  the  second  derivative  with  respect  to  t  yields 

4>i  fl(t)  =  6(Ai.1-2t)/Ai_13 

*i,l  =  (Bt-aAi.jl/Ai.j* 


♦  i , 0  =  6(2t-Ai_j)/Ai_j3 
*1,0  =  {et-AAi-il/A,--!2 

Therefore,  the  integrand  of  the  smoothing  integral  is  expressed  as 

T 


[Pi-l,i(t)P  =  Ki-l,i(t> 


L  *i-lJ 


*i-lJ 


(A6) 


(A7) 


where  K,-_j  .j  is  defined  as 

Ki-l,i(M)  i 

tl  tl  II  II  II  II  It  II 

♦i ,  1  (M)*i  1  \»i)  *  *ifl(M)*i,l<P)  ,  *i,l(M)*i,o(M)  r  *i  ,l(M)*i  ,0<^) 

II  It  II  II  II  tl  II  II 

*i , 1 (M)*i , 1 ( M )  .  *i,l(M)*i,i(p)  ,  <  *i , 1 (M)*i ,o(P) 

II  it  II  II  If  II  II  It 

♦i  ,o(M)*i ,  1  ( M )  .  *l,0(M)*i,l(M)  .  *i  ,o(M)4»i  ,o(P)  »  *i  ,o(M)*i  ,o(M) 

It  II  ll  II  It  It  II  It 

*i,0(M)*i,l(P)  *  <h,o(M)*i,l(4)  '  *i,o(M)*i(o(M)  *  *i ,o(H)*i ,0<M) 
By  utilizing  the  above  equation,  the  smoothing  integral  becomes 


(A8) 


U- 


i-1 ,  i  = 


xi-l 


Ai-i 

/Q  Ki-!,l(t)dt 


*i-l 


(A9) 


Evaluating  the  above  integral,  we  obtain 
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/i-1 


12/Ai-!3 

-6/Ai-!* 

-12/Ai-!3 

H 

1 

•r— 

— 

ID 

1 

4/Ai-i 

6/A-j-!  * 

2/Ai-i 

-12/A,_|3 

6/A-j  -  j 2 

12/Aj-i 3 

6/A-j-i  * 

-6/A,-.!3 

2/Ai-i 

6/Ai-i * 

VAj-i 

(AlO) 


By  defining  matrices  and  Aj_i  as  follows 

r  1  ai-i 


H-l  = 


0  1 


(All) 


-3  —2 

12Ai-l  -6^i-i 

_2  _1 

“6Ai-l  4Ai-i 


(A12) 


where  B-j.j  is  a  symmetric  matrix. 

Equation  (A10)  can  be  expressed  as 


_i  _i 

Bi  -1  -Bi-lAi-l 

_i  T  -1 

("Bi-lA-j-i  )T  Ai.^i-iAi-j 


(A13) 


where  B,_j  and  Aj_!  are  functions  of  the  variable  A-j-j.  By  using  the  above 
notation,  Eq.  (A9)  is  rewritten  as 
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where 


=  ( x i -Ai-jx i-1 ) 


T 

ci-l  =  PAi-lBi-lAi_i 


(  A14 ) 

(A15) 

(A16) 


T 

Di-1  =  "PAi-iBi-i 


(A17) 


.1 

E,-!  =  pB-j-j  (A18) 

Thus,  the  smoothing  integral  is  transformed  into  the  above  quadratic  form. 
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1.  INTRODUCTION 

Weierstrass  provided  the  first  example  of  a  function  which  is  continuous  but  nowhere 
differentiable.'1 1  Hardy'21  later  established  that  under  certain  conditions  this  function  even  fails 
to  have  a  well-defined  infinite  derivative,  i.e.,  a  vertical  tangent.  Weierstrass  never  published  his 
result  but  only  read  it  to  the  Berlin  Academy  on  July  18,  1872.  It  was  first  published  in  1875  by 
Paul  DuBois  Reymond01  who  wrote  that  Weierstrass'  result  was  “equally  too  strange  for  im¬ 
mediate  perception  as  well  as  for  actual  understanding"  and  “will  lead  to  the  limit  of  our  intel¬ 
lect." 

Singh’41  has  written  a  treatise  on  nondifferentiable  functions  and  provides  some  examples  of 
infinite  series  as  well  as  geometric  constructions  such  as  Koch  curves  which  are  nondifferentiable. 
Rather  than  being  hailed  as  a  great  mathematical  discovery,  the  Weierstrass  function  was  present¬ 
ed  more  as  a  pathological  curiosity  and  used  as  counterexample  to  warn  freewheeling  users  of 
mathematics  that  care  must  sometimes  be  exercised  when  attempting  to  manipulate  mathematical 
functions.  It  took  the  genius  of  Mandelbrot'11  to  bestow  honor  on  a  large  collection  of  “path  is 
logical"  mathematical  objects,  such  as  the  Weierstrass  function.  This  function  is 
nondifferentiable  because  it  has  wiggles  on  all  scales.  This  absence  of  a  characteristic  scale  is  the 
paradigm  of  Mandelbrot’s  fractals. 

We  review  in  the  next  section  how  the  Weierstrass  function  appears  naturally  in  a  1-D  ran¬ 
dom  walk  context.'51  In  Section  3  we  review,  through  the  random  walk  framework,  a  manner  in 
which  to  generalize  the  Weierstrass  function.'61  Although  we  do  not  prove  that  the  novel 
mathematical  functions  observed  are  nondifferentiable,  we  do  provide  suggestive  numerical  evi¬ 
dence.  The  proof  or  disproof  of  being  everywhere  nondifferentiable  is  an  open  question  left  to 
those  with  a  stronger  mathematics  background  than  the  authors. 

00  sin  (A’2x) 

A  cautionary  example  is  the  function  of  Rieman,  £  - j - ,  which  was  assumed  to  be 

nowhere  differentiable  until  Gerver  in  1970  proved  that  it  possessed  a  derivative  equal  to  -Vi  at 
values  of  x  which  were  in  lowest  order  ratios  of  odd  numbers. 


2.  THE  WEIERSTRASS  FUNCTION  AND  RANDOM  W  ALK 

Consider  a  random  walker  on  an  infinite  1-D  lattice  of  unit  spacing  whose  initial  position  is 
the  origin.  The  probability  of  being  at  site  /  after  *  steps  is  denoted  by  /*„(/).  From  the 
Chapman-Kolmogorov  equation 


WO-I^.c/-  r)p(n  0) 

r 

where  p(l)  is  the  probability  of  making  a  jump  of  length  /.  Equation  1  is  in  a  convolution  form 
and  is  easily  handled  in  Fourier  space.  Define 
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and 


Then  from  Equation  1 


P„(k)  -  £  eiklP„(l) 

i 

(2a) 

p(k)  -  £  eik,p(l) 
i 

(2b) 

</">-£  /"/>(/) 

i 

(2c) 

P„(k)  -  [/>(*)]" 

or  inverse  Fourier  transforming 

/»„(/)  -  e~ikl[p(k)]ndk  (3) 

—  IT 

For  an  unbiased  walk,  <  />  =  0.  If  <  l2>  is  finite,  then  for  A:  — *  0 

pik)~  1  —  1 hk2<l2>  +  oik1)  (4) 


and 


Pn(lY  - Jit  e~'kl  e~'hnkl<,1> 

IT 

_  __  /_2  _ 

=  ^  e  2 n<i2> 

V  2i rn<l2> 

which  is  the  standard  Gaussian  behavior  insured  by  the  Central  Limit  Theorem. 

However,  if  <  l2>  is  infinite,  then  the  different  behavior  will  result  as  first  described  in  gen¬ 
eral  by  P.  Levy  in  the  1920’s.  Essentially,  if  p(l)~  with  0  <  /3  <  2,  then  </^>  is  the 
lowest  moment  which  diverges  and 


p(k)~  1  -  const.  \k\P 

(7) 

P„ik)~  e-n'k'fi 

(8) 

Equation  8  is  called  the  Fourier  transform  of  a  Levy  stable  flow.  They  have  a  somewhat  more 
general  form  but  this  need  not  concern  us  here.  The  Levy  stable  laws  are  best  characterized  in 
Fourier  space,  Equation  8,  and  cannot  in  general  be  calculated  in  a  useful  closed  form  analytic 
solution. 

The  analysis  can  easily  be  given  for  a  D  dimensional  cubic  lattice,  but  instead  we  now  connect 
it  to  the  Weierstrass  function  by  choosing  a  particular  form  for  /?(/). 

Let 

pil)~  £  [8/+A*  +  8/-*"]fl-"  ’  <«.*>!>  (9) 

This  allows  jumps  of  all  orders  of  magnitude  on  base  b,  with  each  order  of  magnitude  longer 
jump  occurring  with  an  order  of  magnitude  sm?'ler  probability  in  base  a.  Then 


(5) 

(6) 
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p(k)  =  - - -  £  a~ncos(b" k) 

a  »-( i 

(10) 

which  is  the  Weierstrass  function.  It  satisfies  the  scaling  equation 

p(k)  —  a  1  p(bk)  +  cos  k 

a 

(11) 

Any  singular  behavior  of  p(k)  must  satisfy 

&/,«(*)  “  o-1  PsinK(bk) 

(12) 

which  has  the  solution 

A,,WU)  -  k*Q(k) 

(13) 

where 

/3  —  In  a/ In  b  0  <  /3  <  2 

(14) 

and  Q(k)  is  a  function  periodic  on  In  k  with  period  In  b.  It  can  be  shown  that(5)  the  full  solu¬ 
tion  to  Equation  1 1  is 


with 


p(k)  =  kfiQik)  +  - — -I 

n 


(-1  )"k2" 

(2n)\  (1  -  a~'b2n) 


Q(k) 


a  -  1 
a  In  b 


I  r 


-0  + 


/f  — —  oo 


2.  n  it  i 
In  b 


cos 


2niri 
In  b 


x  exp 


2nirlnk 
In  b 


(15) 


It  was  the  Weierstrass  function  of  Equations  II  and  15  that  Hardy  established  had  a  finite 
derivative  at  no  value  of  k  when  b  ^  a. 

In  this  regime  we  see  numerically  how  the  wiggles  grow  upon  wiggles  until  the  function  be¬ 
comes  nondiflferentiable.  See  Figures  1  and  2. 


3.  THE  SPHERICALLY  SYMMETRIC  CONTINUUM  WEIERSTRASS  RANDOM  WALK 

In  D  dimensions  let  the  random  walker  take  spherically  symmetric  jumps  of  variable  length 

Pi(lxl) 


p(x) 


SD\x\ 


-?i  0—1 


0  <  X  <  oo 


where  SD  =  2irD/1/r(D/S)  is  the  surface  area  of  a  unit  hypersphere. 
Fourier  transforming  the  radial  function  p(x ),  one  obtains 


p(k)  -  r 


I  |w*tt) 


-f. 


d  _  ^(\k\\)p\(\)d\ 


In  analogy  to  the  Weierstrass  walk  we  choose 

a  —  1 


Pi  era) 


£  a- "8(1x1  -  b")  ,  b  >  1 

H-0 


which  when  substituted  into  Equation  17  yields 

a  —  1 


p(k) 


£  a~  T 

/»-i 


j\k\bn 


i  - 


D 


JD  (\k\bn) 


7-' 


which  satisfies  the  scaling  equation 


(16) 


(17) 


(18) 


(19) 
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.  O  1 


p(k)  -  a  'p(bk)  + 


(mi) 


-f 


0*U 

2  1 


which  can  be  shown  to  have  the  solution 


p(k) 


a  -  1 
2  a  In  b 


0 hk)*Y 


D 

"if0  n-Vi/3  +  miri/ln  b) 

2  m  it  /In 

A 

2 

2 

mri=c  r('/6Z>  +  '/i/3  -  miri/ln  b)  *XP 

In  b  j 

+  — — -  £ 


(-i  rohk)7" 

a  ~0  r(/i  +  'AD)(\  -  a~lb2n)  '  (0  <  ^  <  2) 


(20) 


It  is  Equation  19  which  is  hypothesized  to  be  a  proper^  generalization  of  the  Weierstrass  func¬ 
tion.  Let  us  now  consider  its  differentiability.  Since  p(k)~  1  -  Oik1*),  differentiability  holds  at 
A '  —  0  if  (3  >  1.  It  can  be  shown  that  if  /3  >  V$( 3  -  £)),  then  p(k)  is  differentiable  with  respect 
to  A  for  all  k  >  0.  This  corresponds  to  /3  >  1  in  one  dimension,  /3  >  V2  in  two  dimensions,  and 
/3  >  0  in  three  dimensions.  As  the  one-dimensional  case  reduces  to  the  Weierstrass  function,  we 
focus  on  the  two-dimensional  case 


p(k)  -  - - a~nJv{kbn)  (21) 

0  n- 0 

For  the  above  equation  we  again  see  numerically  that  for  b  >  a  wiggles  grow  upon  wiggles 
until  the  function  becomes  nondifferentiable.  See  Figures  3,  4,  and  5. 
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Figure  2.  Plots  from  Equation  10  with  (a)  one 
term,  (b)  two  terms,  etc.,  with  a  ■  8  and 
*  ■  3  showing  the  lack  of  growth  of  wig¬ 
gles. 
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Figure  3.  Plots  form  Equation  21  with  (a)  one 
term,  (b)  two  terms,  etc.,  with  a  ■*  3, 
b  «  8  showing  the  nondifferentiability  of 
the  three>dimensional  random  walk 


' 
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Figure  5. 


Plot  from  Equation  21  with  six  terms 
showing  the  transition  from  a  smooth  to 
a  nonsmooth  function  (2  <  a, t  <  7). 
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A  FAST  ALGORITHM  FOR  THE  MULTIPLICATION  OF  GENERALIZED 
HILBERT  MATRICES  WITH  VECTORSf 


Apostolos  Gerasoulis 
Department  of  Computer  Science 
Rutgers  University 
New  Brunswick,  NJ  08903 


Abstract.  We  deaeribe  an  algorithm  with  an  0>t(n(log  n)2)  time  complexity  for  the 
multiplication  of  generalized  Hilbert  matrices  with  vectors.  These  matrices  are  defined  by 
-  1  /(*,-«,)*>  *,  j=  1,  n  and  p  -  1,  2,  where  t.  and  s.  are  distinct  elements  and 
t.  ^  i  =  l,  n.  An  implementation  of  the  algorithm  for  the  Chebyshev  points,  which  arise 
in  the  numerical  approximation  of  Cauchy  singular  integral  equations,  is  presented.  The  time 
complexity  of  the  algorithm,  for  this  special  set  of  points,  reduces  to  0A(n  log  n). 

1.  Introduction: 

Let  us  define  the  matrices  Bp)  p  =  1,  2,  by 


{B,k‘  «,-/ 


i  =  i. 


where  t.  and  «f.,  are  distinct  elements  and  ti  jt 
ard  ij  =  -y+1,  is  the  well  known  Hilbert  matrix 


(1) 


=  1,  .  n.  The  special  case,  p  =  1,  -  i 


~  ,  + y_  i’  •••.»>. 


(2) 


In  [17],  the  following  question  was  considered: 

"Given  a  vector  y.  Does  there  exist  an  algorithm  for  computing  the  product  Ty  in  less 
than  0A(n 2)  operations?” 
where  the  matrix  T  is  given  by 


T.  . 


{ 


if  »'// 

,  i,;  =  1,  ...  n 

if  »'=/ 


(») 


with  distinct  e{,  and  where  the  time  or  space  complexity  0A(f(n ))  is  defined  in  (Aho  et.  al.  [1], 
pp.  19-22).  This  problem  was  initially  posed  by  Golub  in  [18]  and  [19]  and  it  is  known  as 
Trummer’s  problem.  It  has  generated  great  interest  because  of  various  applications  in  the 
computation  of  conformal  mappings  (Trummer  et.  al.  [27],  [32]),  the  zeta  function  (Odlyzko  and 
Schonhage  [30]),  and  the  numerical  evaluation  of  singular  integrals  (this  paper). 


In  [17],  we  have  proposed  an  0A(n  (log  n)2)  algorithm  for  Trummer’s  problem,  henceforth  the 


t  Thii  material  ii  bared  upon  work  eupported  by  the  National  Science  Foundation  under  Grant  No.  DMS*8606464 


1285 


GGS  algorithm.  The  GGS  algorithm  uses  Fast  Fourier  Transform  (FFT)  polynomial 
multiplication,  polynomial  interpolation  and  polynomial  evaluation  at  n  distinct  points. 

In  Section  2,  we  show  that  the  GGS  algorithm  can  be  extended  to  include  the  matrices  defined 
in  (1).  The  time  complexity  of  the  extended  algorithm  is  the  same  as  the  GGS.  In  Section  3, 
we  present  examples  of  generalized  Hilbert  matrices  similar  to  (l)-(3),  which  arise  in  the 
numerical  approximation  of  singular  integrals.  In  Section  4,  we  implement  the  extended 
algorithm  for  the  points  *  =  cos(jir/n),  j  =1,  ...,  n-1  and  t.  =  cos  ((2*  —  1) jr/2n),  »  =  1,  ...,  n, 
which  arise  in  the  numerical  approximation  of  Cauchy  singular  integral  equations.  The  time 
complexity  of  the  algorithm,  for  this  special  set  of  points,  reduces  to  0A(n  log  n).  Finally,  in  the 
Appendix,  we  present  a  collection  o  problems  for  which  the  new  fast  algorithm  could  be  used 
to  speed  up  computations. 


2.  An  extended  GGS  algorithm 

In  this  section,  we  briefly  describe  an  extension  to  the  GGS  algorithm  for  the  multiplication  of 
B  with  a  vector. 

p 

We  first  notice  that  the  problem  of  multiplying  Bpy  is  equivalent  to  evaluating  the  function 
/p(x)  at  the  points  t{,  i  =  1,  n,  where 


W  =  E 


y=i  <*-*/ 


p  =  i,  2. 


(4) 


Since  f2(x)  =  we  only  need  to  consider  fx(x). 

We  follow  Gastinel  13  and  express  /j(x)  as  the  ratio  of  two  polynomials  h(x)  and  g{x),  where 
j(x)  is  an  n-th  degree  polynomial  defined  by 


?(*)  =  n  ~  *>) 

;=i 

and  h[z)  is  a  polynomial,  of  degree  at  the  most  n-1,  determined  from 


/j(z) 


M*)  _  y 

g(x)  £  x-t. 


By  setting  x  =  i(.,  i  -  1,  n,  in  (6),  we  derive 


(5) 


(6) 


M«.)  =  V,  9 '(«,),  «  =  1,  n  (7) 

which  implies  that  h(x)  is  the  interpolation  polynomial  at  the  points  (*.,  /§(«-)),  *  =  1,  ...,  n. 

It  is  clear  now  that  the  matrix  multiplication  problem,  Bpy,  is  equivalent  to  evaluating  the 
functions 


w 


MM 


W 


*(<,)  ?'(M 


i-i 


?(M'  'z,,#  9(i,)  j,Mt) 

while  Trummer’s  problem,  Ty,  is  equivalent  to  evaluating  (Gerasoulis  et.  al.  [17]) 


(8) 
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X  .  = 

J 


\  Vj  g"{c}) 


1,  ....  n. 


0) 


It  should  be  mentioned  here  that  for  p  >  3,  the  multiplication  of  Bpy  is  equivalent  to  relations 
similar  to  (8).  These  relations  may  be  derived  from  the  identity  ff[x)  =  /p_j(*)/(l-p)> 
P  =  3,  4,  ...  . 


We  now  describe  an  efficient  algorithm  for  the  evaluation  of  }x (tj  and  /2(tt),  »  =  1,  ...,  n. 
Procedure  FAST(n ,  t,  s,  y);  return  flt  fa  ; 


1.  Compute  the  coefficients  of  g(x )  in  its  power  form,  by  using  FFT  polynomial 
multiplication,  in  0A(n  (log  r»)2)  time  (e.g.  Horowitz  [22],  Aho  et.  al.  [1],  Theorem 
8.14,  p.  299); 

2.  Compute  the  coefficients  of  g'(x)  in  0A(n)  time; 

3.  Evaluate  ?'(<,),  t  =  1,  ...,  n,  and  g'^t^,  ;  =  1,  ...,  n,  in  0A(n  (log  n)2)  time  (Aho 

et.  al.  [1],  Corollary  2,  p.  294); 

4.  Compute  hit.)  =  y  g’ (t ),  j  ~  1,  ....  n,  in  O.(n)  time; 

J  J  J  A 

5.  Find  the  interpolation  polynomial  A(x)  at  the  points  (<^,  J  =  1<  •••<  n,  in 

0y|(n(l°g  ”)2)  time  (Aho  et.  al.  jl],  Theorem  8.14,  p.  299); 

6.  Compute  the  coefficients  of  h’(x )  in  0A(n)  time,  and  evaluate 

h{t{)  and  A* ' («t),  «  =  1,  ....  n,  in  0A(n  (log  r»)2)  time,  by  following  the  same  technique 
as  in  steps  2  and  3; 

7.  Compute  /Jt.)  =  h(r)/g(t.)  and  /2(«.)  =  h,[ti)/9[ti)-  h(ti )g’(ti)  )/p2(t,),  »  =  1,  ....  n,  in 
0A(n )  time; 

end  FAST; 

The  space  and  time  complexity  of  FAST  are  04(rjlogn)  and  0A(n  (log  n)2),  respectively.  In 
Section  4,  we  consider  two  important  special  cases  for  which  the  time  complexity  of  FAST 
reduces  to  O^fnlog  n). 

3.  Generalized  Hilbert  matrices 

In  this  section,  we  present  generalized  Hilbert  matrices  which  arise  in  the  quadrature 
approximation  of  Cauchy  singular  integrals.  Additional  problems  for  which  FAST  is  applicable 
are  given  in  the  Appendix. 

We  consider  the  Cauchy  principal  value  integral 


/,  p{t) 

/(*;•)  =  j 

j -i  t - i 


<  i 


where  u>(t)  is  a  weight  function  defined  by 


(10) 
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'»<»;•>  -  dJir1  =  1,1  <  1 


where  it  is  assumed  that  w(t)  and  y(<)  are  such  that  the  derivative  of  I(y ; «)  exists.  The 
singular  integrals  defined  by  (10)  and  (12)  arise  in  fields  such  as  aerodynamics,  wave-guide 
theory,  scattering,  fracture  mechanics  and  others  (see  Appendix).  For  example,  in  fracture 
mechanics  the  solution  of  the  equation 


£  (i-*2r1/2r;*  =  f{,)'  1,1  <  1  (18) 

represents  the  derivative  of  the  crack  opening  under  a  given  pressure  distribution  /(«)  along 

(-1.  !)• 

We  will  now  derive  quadratures  for  the  singular  integrals  (10)  and  (12).  These  quadratures  give 
rise  to  matrices  similar  to  Bp  and  T.  We  only  need  to  consider  the  quadrature  approximation 
of  (10).  Quadratures  for  lH(y\s)  and  matrices  similar  to  B2  can  be  obtained  from  the 
quadratures  for  I(y\»)  via  (12). 

By  rewriting  (10)  as 


y(t) — y(«)  [l  w(t) 

/(v;*)=/  w(t)v±U±ldt  +  y[l)l  J2dt 

J  —  1  I  I  J  — ■  1  l  I 


we  see  that  classical  quadratures  may  be  used  to  approximate  I(y ; «),  provided  that  the  second 
integral  in  (14)  can  be  computed  to  within  any  given  tolerance.  This  computation  may  be 
performed  once  via  a  very  fast  convergent  series  of  the  Hypergeometric  function.  Then,  the  first 
integral  can  be  computed  via  a  quadrature  for  different  functions  y(t).  These  computations  may 
be  performed  by  using  procedure  FAST. 


For  simplicity,  we  consider  below  only  two  cases  of  the  weight  function  w(t).  We  will  use  the 
trapezoidal  and  Gauss-Chebyshev  quadratures  for  the  approximation  of  /(y;«).  The  analysis  can 
easily  be  extended  to  the  general  weight  function  tu(t)  in  (11). 

The  case  a  =  0  —  0: 

Here,  the  weight  function  w(t)  =  1.  By  using  the  trapezoidal  rule  for  the  approximation  of  (14), 
we  derive 


"  *(*,■)-*(«)  |i-,| 

M*;*)  =  ZX  ~TT"  +  v(*)Iog  "T+TT’  (15) 

t'=0  *  1  1 

where  =  -1  +  ih,  i  =  0,  1,  n-1,  h  =  2/n,  u>0  =  u>B  =  h/2  and  uk  =  h,  i  =  1,  ...,  n — 1. 

Trummer’s  matrix  T,  can  be  derived  by  setting  «  =  j  =  1,  n-1  in  (15) 
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"  w  y[t  )  "  w. 

7«(n*j)  =  51  "Tip  +  v(‘y)Z  PT  +  »,*'(*>)  +  v(‘y)lo8 


•=o  1 


i=l 


Myl 

11+*/ 


j  =  1,  •••,  n- 1 


(16) 


where  we  have  assumed  that  the  y'(^)  exists. 

The  case  a  =  0  =  1/2: 

For  this  special  case  the  weight  function  becomes  w(t)  =  (l-t2)-1/2.  By  using  the  Gauss- 
Chebyshev  quadrature  and  the  identities  (Erdogan  et.  al.  [12]) 


"  1  "„-»(•) 

"  ^  t-i 

i=l  1 


£ 


1  (l_t2)-»/2 


t-i 


dt  =  0, 


M<1 


we  see  that  l(y ;  «)  is  approximated  by 


(17) 


.  ,  .  ,  ^  *(*.)-»(')  ,  ^  »(*<)  , 
/«(»!•)  - »  E-rzr~-n  ErTT  +  TTr1'1* 


i=l  '  i=l  * 

where  t.  =  cos  ((2i'-l)jr/2n),  t  =  1,  ....  n,  are 


.<*) 

the 


zeros 


of 


r.W 


(18) 


and 


=  cos  (jV/n),  j  =  1,  ....  n— 1,  are  the  zeros  of  ^B_i(«),  and  where  JJt)  and  f/n_1(a)  are  the 
Chebyshev  polynomials  of  the  first  and  second  kind,  respectively. 

Now,  by  setting  <  =  j  =  1,  ....  n-1,  in  (18)  we  obtain  a  matrix  similar  to  Bx 


ln(Vi»j)  =  r."1  E—  ’  >=1 .  "-1 

i=l  «  J 

while  by  setting  t  =  <  =  1,  n,  we  obtain  T  and  summations  similar  to  (16). 

Next,  we  use  (19)  and  FAST  to  obtain  a  numerical  solution  for  equation  (13). 


(19) 


4.  An  application  of  FAST  , 

7 

In  this  section,  we  present  an  0A(n  log  njf  implementation  of  FAST  for  the  points 
r  =  cos  ((2i'-l)ir/2n),  «'  =  1,  n,  and  »■  =  cos  (yir/n),  j  =  1,  ...,  n-1,  which  arise  in  (18).  For 
the  points  t(.  =  -1  ■+•  ih ,  i  =  1,  n-1,  we  do  not  need  to  use  FAST,  because 

ti  -  t}  =  (i-j)h,  and  therefore  the  sums  in  (16)  can  be  computed  directly  via  FFT 
convolutions  in  0^(n  log  n)  time  (Brigham  [6],  Henrici  [21]). 

We  now  consider  the  numerical  solution  of  (13).  By  using  (19)  to  approximate  (13),  we  obtain 
the  (n-l)xn  algebraic  system 


Ay  =  f,  4  =  — - r,  j  =  1,  n-1,  i  =  1,  ....  n  (20) 

where  y  =  |y(t,),  y(<2)>  -  .  vUjf  and  f  =  l/(«i).  /(«2)>  -  *  /(*»-|))T-  Since  A  P°88e88es  a  r‘8ht 
inverse  A1  (Gerasoulis  [14]),  the  solution  of  (20)  can  be  obtained  from 
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n,  ;  =  1 .  n-1 


(21) 


y  =  A'f  +  6nUn’  (  Ai  = 


t,}  »(*r«y) 


i  «  =  1.  •••» 


where  6n  is  an  arbitrary  constant  determined  by  an  additional  condition  imposed  on  the 
solution  y(t)  and  un  is  a  vector  with  all  elements  equal  to  one.  We  are  now  ready  to  apply 
FAST  for  the  computation  of 


.  ,  "-1  (l-‘y)/(*y) 

A^f  =  n  1  ^  — — - ,  i  =  1,  n.  (22) 

y=l  •  l 

The  algorithm  presented  below  is  a  modification  of  FAST  for  the  INPUT: 
ti  =  cos  ((2i-l)jr/2r»),  i  =  1,  ....  n,  t.  =  cos(y*/n)  and  /(ijj,  j  =  1,  ....  n-1. 

1.  Instead  of  computing  g(x)  in  its  power  form,  we  will  use  its  product  form  directly. 

We  have  g(x)  =  flyJi1  {*  ~  *y)  =  cn  ^«_i(a)  ~  (n0)/sin  (#),  where  e#  = 

and  cos  (0)  =  *. 

2.  Similarly,  ,'(*)  =  -ejnfj*)  -  *!/„_,(*)]/( I-*2). 

3.  Since  g’(*})  =  cn  (-l)J+l  n/(l-«2l,  the  computational  complexity  for  this  step  reduces 
to  0A(n)  time. 

4.  Equation  (7)  and  step  3  above  imply  that  A^)  =  en  (-l),+1  j  =  1 .  n-1. 

5.  We  find  h(x)  by  using  orthogonal  polynomial  interpolation.  We  set 


n-1 

M*)  =  E  akuk-i(*) 

k=i 

and  use  the  orthogonality  identities,  which  hold  for  all  integers  /,  m  such  that 
l+m  <  2n-l, 


(23) 


. . : 


f  r  if  l  =  m 
l  ^  m 


to  obtain 


(24) 


n-1 


n-1 


ak  =  2n  ,E(1-‘,2)M«y)^_i(«y)  =  E  2ly8in(*^).  *y=  "-1M«y)  «n  (^) 

j=  1  J=1 

for  1  =  1,  ...,  n-1.  The  coefficients  ak,  1=1,  ...,  n-1,  can  be  computed  via  FFT  in 
0^(nlog  n)  time  (Aho  et.  al.  [1]). 

(>.  From  step  1  and  (23),  we  obtain 
<n(-D’  +  1 


(25) 


n-1 


*(*,■)  =  - 

*  Cl 


sin  ((2i-l)»r/2n) 


.  m*<)  =  E  °*sin  (*  (^~r~) 


for  »  =  1,  ....  n. 


*=l 

1290 


2n 


(26) 


7.  Finally,  AT  is  computed  from 


FI-  1 


W  = 


*=1 


2  n 


(27) 


for  »  =  1,  ....  n,  by  using  FFT  in  O^n  log  n)  time. 

The  total  cost  of  the  above  implementation  of  FAST  is  0A(n  log  n),  since  it  only  requires  the 
application  of  FFT  twice.  In  Table  4.1,  we  present  our  computational  experience  with  FAST 
for  the  function  /(«)  =  1.  For  this  case  the  summations  can  be  obtained  exactly  from  the 
identity  (e.g.  Erdogan  et.  al.  [12]) 


n_1 l-«* 

r»“l  Y,  —  =  •'  =  1.  •••.  n.  (28) 

3=1  •  J 

The  computations  have  been  performed  by  using  the  subroutines  SINT  and  SINQF  from 
FFTPACK  of  NETLIB  (Swarztrauber  [31]),  which  are  most  efficient  whenever  n  =  2*.  As 
expected,  FAST  outperforms  the  0A(n2)  algorithm  for  all  n  >  32,  (Brigham  [6],  p.  152).  The 
Table  also  shows  that  in  addition  to  its  performance,  FAST  attains  better  accuracy  than  the 
0A{n?)  algorithm.  Similar  results  have  been  obtained  for  several  other  choices  of  /(«). 

Note.  All  computations  were  performed  on  a  DECSYSTEM/2060T  using  FORTRAN  77  with  a 
single  precision  floating  point  arithmetic  (with  a  mantissa  of  approximately  8  decimals  and  with 
an  exponent  in  the  range  0.14xl0-38  to  1.7xl038). 

5.  Concluding  remarks 

In  Section  4,  we  have  described  an  efficient  and  stable  implementation  of  FAST  for  a  special 
set  of  points  which  arise  in  numerous  applications  (see  Appendix).  The  implementation  and 
stability  properties  of  FAST  for  an  arbitrary  set  of  points  still  remain  to  be  addressed. 

The  computational  advantages  of  FAST,  over  direct  multiplications  algorithms,  become  apparent 
if  n  is  very  large  (see  Table  4.1)  or  if  the  product  B^y  is  repeatedly  computed.  In  the  case  of 
repeated  computation  of  3fy  the  advantages,  in  terms  of  actual  CPU  execution  time,  are  even 
more  pronounced.  For  example,  in  the  quadrature  approximation  of  two  dimensional  singular 
integrals,  using  the  Chebyshev  points,  the  complexity  via  FAST  is  0A(n 2  log  n)  as  opposed  to 
0^(n3)  via  the  direct  algorithm.  Thus,  the  times  given  in  Table  4.1  will  have  to  be  multiplied 
by  n. 
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|  0A(n 

| 

log  n)  Algorithm 

H - 
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ii 

T 
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Time  in 

|  Max.  Error 
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1 

1 

1  * 

■4C 

N 
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1 
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4 
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1 
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II 
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1 
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1 
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8 

0.0031 
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1 
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2048 
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||  91.3278 

|  0.336x10"* 

1 

1 12 

4096 

2.0592 

|  0.743x10"* 

1 

||  365.2306 

II 

|  0.713x10"* 

J _ 

1 

Table  4-1:  The  performance  of  FAST  for  /(«)  =  1 
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I.  Appendix 

In  this  Appendix',  we  briefly  discuss  iterative  algorithms  for  the  solution  of  singular  integral 
equations,  for  which  FAST  could  be  used  to  speed  up  computations. 

1.  Consider  the  Cauchy  singular  integral  equation 


7/  w(t)~dt  +  xf  w(t)K(t,t)y(t)dt  =  /(*),  |*|<1 

n  J  —  1  t  I  •/  —  1 


(29) 


where  w(t)  is  a  weight  function,  K(t,t)  and  /(<)  are  given  input  function. 

By  approximating  the  last  equation  via  a  quadrature  similar  to  the  one  described  for  equation 
(13),  we  obtain  the  algebraic  system 


w. 


(A  +  AC)y  =  f,  (A)..  =  t_i; 


(C)jt  =  nu>i  j  =  1,  ....  m,  i  =  1,  ....  n 


(SO) 


where  wf  are  the  quadrature  weights  (e.g.  Gerasoulis  [14]).  The  last  algebraic  Bystem  can  be 
solved  by  using  either  a  direct  or  an  iterative  method.  Iterative  methods  such  as  the  generalized 


conjugate  gradient  (Concus  and  Golub  [9],  Trummer  [32]),  Nystrom’s  iterative  variants 
(Gerasoulis  [15],  [16]),  residual  correction  (Hashimoto  [20])  and  successive  approximation 

(Tsamasphyros  and  Theocaris  [33],  Ioakimidis  [24]),  could  be  used  to  solve  (30),  particularly  if 
the  dimension  of  the  system  is  large.  As  an  example,  we  present  the  following  iterative  method 
for  (29),  (Tsamasphyros  and  Theocaris  [33]), 


n  n-1U-«J) 

F,(t)  =  f(t)  +  A  an’*  £*(!,,!)£ -^ /•,_,(«<),  /  =  F„(0  =  /(*)  (SI) 

1=1  i-l  i  * 

where  Zt-  =  cos  ((2* — l)*r/2n),  «  =  1,  ...,  n,  t.=  cos  (y'x/n),  j  =  1,  ...,  n-1,  and  w(t)  =  (l-t2)-1/2. 

The  solution  y(t),  for  |(|  <  1,  is  obtained  from  Ft(t)  via  a  summation,  e.g. 

v(l)  «  »M(1)  =  n_lfw(1)  +  1  (*,41)^W(*J)>  ^  M  sufficiently  large. 

Now,  if  K(t,t)  is  such  that  the  resulting  matrix  C  is  similar  to  Bp,  then  we  can  apply  FAST. 
This  is  indeed  the  case  for  the  following  singui.v  integral  equations. 

(a)  The  Interface  Crack  equation: 

The  weight  function  is  given  by  w(t)  =  (1-t2)1/2  and  the  kernel  by 

(l_72t2)l/2(1_  2,2)-l/2_i 

*(«,.)  =  t  - 

where  6  and  7  are  given  material  constants.  This  equation  has  received  considerable  attention  in 
the  literature.  Comninou  [8]  has  obtained  a  numerical  solution  for  1  -  10"4  <  7  <  1 -10-7,  by 
using  direct  methods  for  the  linear  algebraic  system  (30).  Because  the  dimension  of  the 

algebraic  system  becomes  very  large  for  1>7  >  1-10'7,  direct  methods  fail  and  the  solution  is 
not  known  in  this  region.  As  a  result,  certain  important  physical  quantities,  such  as  the  stress 
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intensity  factor,  had  to  be  conjectured  (Comninou  [8],  pp.  633-634).  We  refer  the  reader  to 
C.  Atxinson  et.  al.  [2]  for  a  recent  discussion  regarding  this  problem.  An  iterative  method, 
such  as  the  ones  discussed  above,  together  with  the  O^n  log  n)  implementation  of  FAST, 
presented  in  Section  4,  may  be  found  useful  in  solving  this  equation. 

(b)  Equations  with  generalized  kernels 

These  equations  arise  in  wave-guide  theory,  elasticity  and  dislocations  in  metallurgy.  The  kernel 
is  given  by  K(t,t)  =  l/(t+i-2)  and  the  weight  function  by  u»(t)  =  (l-t)°(l+t)-1/2. 

Numerical  results  reported  in  the  literature  differ  somewhat,  e.g.  in  Erdogan  et.  al.  [12]  the 

solution  is  y(l)  =  13.13,  while  in  Bassani  et.  al.  [4]  y(l)  =  15.863.  This  is  because,  the 
quadrature  method  converges  very  slowly  and  large  algebraic  systems  must  be  solved  to  attain 
a  reasonable  accuracy.  Here,  a  general  stable  implementation  of  FAST  along  with  an  iterative 
technique  could  also  be  found  useful. 

2.  Additional  examples  for  which  FAST  is  applicable  can  be  found  in  the  following  references: 
The  linear  transport  equation  (Kelley  et.  al.  [26],  eq.  43),  the  antiplane  shear  crack 
(Tsamasphyros  &  Theocaris  [33],  eq.  46),  edge  crack  (Boiko  &  Karpenko  [5],  eq.  20), 

multidimensional  singular  integrals  (e.g.  Monegatto  [29]),  Hadamard’s  singular  integrals  (e.g. 
Kaya  [25],  Ioakimidis  [23]),  Calogero’s  equation  for  the  asymptotic  density  of  the  zeros  of 

Hermite  polynomials  (D.  Atkinson  [3]),  the  H-function  of  Chandrasekhar,  Neutron  transport 
equation,  Prandtl’s  equation  for  thin  airfoil,  the  sail  equation  (e.g.  Elliott  [10]),  Vorticity 
equation  (Constantin  et.  al.  [7],  Krasny  [28]). 

A  detail  analysis  of  iterative  methods  for  singular  integral  equations  and  their  applications 
discused  above  is  beyond  the  scope  of  this  paper.  Some  of  these  methods  have  already  been 
described  in  the  literature  (Tsamasphyros  and  Theocaris  [33],  Ioakimidis  [24],  Gerasoulis  [15]), 
while  others  are  currently  under  investigation  and  will  be  reported  in  a  future  communication 
(Gerasoulis  [16]). 
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ABSTRACT.  This  paper  discusses  some  aspects  of  the  stability  problems  of  a 
free-flying  column  subjected  to  axial  thrusts.  In  an  age  of  spacecrafts  and 
missiles,  the  stability  of  unsupported  flying  structures  is  obviously  of  great 
importance.  Suprisingly  though,  there  has  not  been  a  great  deal  of  work 
addressing  this  type  of  problem.  In  this  paper,  first  the  brief  history  of  the 
lateral  stability  of  a  column  is  reviewed,  and  then  the  basic  characteristic 
features  of  the  stability  problem  of  a  free-free  column  are  discussed.  The 
mathematical  techniques  developed  to  solve  these  problems  depend  on  a  particular 
problem  considered.  The  most  general  case  requires  the  solution  of  a  nonself- 
adjoint  differential  equation/boundary  condition  system,  which  is  homogeneous 
and  with  zero  eigenvalues.  Numerical  procedures  for  such  a  system  appear  to 
work  well,  although  theoretical  proof  of  convergence  Is  still  lacking.  Results 
of  these  procedures  are  discussed. 

I.  INTRODUCTION.  In  this  paper,  a  long  free-free  slender  beam  is  used  as 
a  model  for  a  flexible  missile  or  rocket.  The  beam  behaves  as  a  Bernoulli -Euler 
column,  and  in  this  case  is  assumed  to  be  rotating  about  its  longitudinal  axis 
and  subject  to  an  end  thrust  (Figure  1).  Of  prime  Interest  is  the  effect  of  the 
rotation  on  the  lateral  stability  of  the  beam.  The  motion  is  assumed  to  be 
planar. 

Different  phases  of  the  problem  have  been  investigated  in  the  past.  A  sum¬ 
mary  of  the  previous  work  is  given  in  Reference  1.  Silverberg  [2]  was  the  first 
to  include  thrust  on  the  flying  column.  The  differential  equation  for  a  free- 
flying  beam  was  given  earlier  as  shown  in  Reference  3.  Beal  [4]  and  Feodos'ev 
[5]  obtained  results  with  pulsating  thrust.  In  1972,  Matsomato  and  Mote  [6] 
treated  a  similar  problem  with  directional  thrust.  In  this  case,  however,  feed¬ 
back  control  was  included  and  a  time  delay  was  applied  to  the  control.  The  next 
contribution  to  understanding  the  problem  was  given  by  Peters  and  Wu  [1].  They 
concentrated  on  mode  shape  solutions  at  zero  frequency  for  different  thrusts. 

A  comprehensive  description  is  also  given  in  Reference  [1]  for  the  eigenvalues 
and  mode  shape  near  zero  thrust  and  with  a  thrust  direction  close  to  a  that  of  a 
follower  force.  Recently,  Park  and  Mote  [7]  Included  a  concentrated  mass  and 
feedback  control.  The  feedback  control  included  was  allowed  to  be  from  dif¬ 
ferent  points  along  the  beam. 
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As  stated  above.  In  this  paper  the  effect  of  rotation  on  the  stability  of  a 
free-free  beam  is  assessed.  The  next  section  is  a  description  of  the  problem. 

In  Section  III  the  variational  statement  used  for  the  solution  is  described. 
Section  IV  shows  how  the  variational  statement  is  used  with  finite  elements  to 
solve  the  problem,  and  Section  V  discusses  the  results  of  the  investigation. 


II.  PROBLEM  STATEMENT.  The  geometry  of  the  problem  is  shown  in  Figure  1. 
The  beam  has  a  constant  cross-section  of  area  A,  density  p,  Young's  modulus  E, 
and  moment  of  inertia  I.  It  shows  a  free-flying  column  subject  to  axial  thrust 
with  directional  control  and  rotating  about  its  axis.  The  differential  equation 
for  the  beam  is  given  by 


Eiuiv  +  p{_  ui)«  +  pACi  +  pAQ*u  *  0 


(1) 


The  first  three  terms  represent  the  column  as  treated  in  Reference  1.  The  last 
term  on  the  left  hand  side  shows  the  effect  of  the  rotation.  The  boundary  con¬ 
ditions  are  given  by 


u"(0)  =  0  ,  u”(i)  -  0 

u"'(0)  •  0  ,  EIu'" (1)  -  KqPu'(I)  «  0 

In  dimensionless  form  with 


u  ■  u/f  , 

X  *  x/f  , 

T  «  t/T 

pAI« 

PI* 

Q 

T*  a - 

,  Q  ■ - 

,  w  •  - 

El 

El 

T 

and  writing 


u(x,t)  ■  u{x)e*t 

the  differential  equation  then  becomes 

+  Q(xu,).  +  x*u  +  w«g  „  o 

with  the  boundary  conditions 


u"(0)  -  0 
u"'(0)  -  0 
u"(l)  -  0 

u” * < 1 )  -  KoQ[u'(l )]  -  0 


(2) 


(3) 

(4) 

(5) 


(6) 
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Rewriting  Eq.  (S)  as  (and  dropping  hats) 

u""  +  Q(xu*)  +  (A*+w* )u  =  0  (7) 

it  appears  that  the  addition  of  rotation  simply  shifts  the  frequency  of  vibra¬ 
tion  of  the  system.  The  boundary  conditions,  Eq.  (6),  become 

u"(0)  *  0 

u" 1 (0)  =  0 

u"(l)  -  0 

u"'(l)  -  KeQu'(l)  =  0  (8) 

The  special  variables  are  made  dimensionless  by  dividing  through  by  the  beam's 
length  1  and  time  is  made  dimensionless  by  dividing  through  by  a  constant  T  = 
(pA!4/EI)%  which  has  the  units  of  time. 

The  parameter  A  is  a  complex  number  in  general 

A  *  Ap  ♦  iAj 

where  both  Ap  and  Aj  are  real  numbers. 

III.  VARIATIONAL  STATEMENT.  To  find  the  form  of  the  variational  state¬ 
ment,  the  differential  equation  is  multiplied  by  an  arbitrary  variation  of  the 
adjoint  field  variable,  6v(x),  and  integrated  over  the  beam  length. 
Integration-by-parts  indicates  the  form  of  the  variational  statement  and  the 
natural  boundary  conditions.  The  variational  statement  is  given  by 

6 J  =  0  (9) 


where 

.1 

J  *  Jq  [U"V«  -  Qxu'V  ♦  (Aa+(i}*)uv]dx  ♦  Q(l+K0)u'  (l)v(l)  (10) 

Performing  the  variation  of  J  with  respect  to  u  and  v,  one  can  arrive  at  the 
original  boundary  value  problem  as  well  as  the  adjoint.  Equation  (10)  is  the 
basis  for  a  finite  element  solution  to  the  described  problem. 

IV.  FINITE  ELEMENT  AND  NUMERICAL  FORMULATION.  The  procedure  begins  by 
taking  the  variation  of  Eq.  (10)  and  allowing  the  variations  in  the  problem 
variable,  6u(x),  to  be  zero,  l.e.,  varying  adjoint  variable  v(x)  only  for  now, 

Z1  [u"«v"  -  Qxu'Av'  ♦  A*u6v]dx  -  Q(l+K0)u' (l)fiv(l)  ■  0  (11) 

0 

where  A*  ■  A*  ♦  «* .  To  discretize,  the  beam  is  di viced  into  L  elements,  letting 

<  -  L{x  -  -^-}  i  -  1,2,3. ...L  (12) 
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be  the  running  coordinate  in  each  element.  Substituting  Eq.  (12)  into  Eq.  (11) 


L 

5  z1  -  qu  ♦  n-nluHi'svm'  ♦  ~ 

*-  0  L 

i*l 

-  Q(l+Kfl)u(L)‘(l)5v(L)(i)  =  o  (13) 

In  order  that  the  displacements  and  their  derivatives  within  an  element  be 
expressed  in  terms  of  their  nodal  values,  the  coordinate  vectors  are  introduced. 

0(i)T  =  {U1(-*>  U2H)  U3(i)  U4n>| 

V  ( ** ) T  =  {vx<  ■* )  V2J1)  V3M)  V4M)}  (14) 

Uj(i),  l^1)  represent  the  displacement  and  slope  at  the  left  end  of  the  ith 
e  ement,  and  U30)  and  U4^)  represent  deflection  and  slope  at  the  right  end.  A 
similar  interpretation  is  applied  to  the  adjoint  coordinate  vector  vn).  The 
transform  is  indicated  by  T. 

Hermitian  polynomials  are  used  to  relate  the  displacements  within  an  ele¬ 
ment  to  its  nodal  values,  hence,  the  following  shape  function  is  assumed 

aT(«)  =  {1  -  3{*  ♦  2?s  ,  {  -  2{»  ♦  l*  ,  3C*  -  2*»  ,  -**  +  *»}  (15) 

so  that 

uM)(*)  .  aTU)uH) 

v(<)(«)  «  r«)V<<)  (16) 

Substituting  Eq.  (16)  into  Eq.  (13) 

L 

^  u(i)T{LaC  -  Q[D+(i-l)B]  ♦  ~  A}«vM)  -  Q[l+K0]u(L)TE«vO-)  .  0  (17) 

i»l 

with 

A  •  f1  ai»Td*  ,  B  =  J1  a'a'Td{  ,  C  -  /*  a"a,,Td* 

0  0  0 

6  ■  f1  {a'a,Td{  ,  E  «  a'U)aT(L)  (18) 

0 

Rewriting  Eq.  (17) , 

L 

J  0M)T{A*rM)  ♦  =  0  (19) 

i«l 
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where 


PO)  ■  A/L  1  ■  1,2, ...,l 

SO)  -  L*C  -  Q[0  ♦  (1-1  )B]  1  ■  1,2,.. .,1-1 

S(D  -  L*C  -  Q(D  ♦  (L-l)B]  -  Q(l+Ke)E  (20) 

Using  certain  continuity  conditions  between  tha  eleeent  nodal  values 


(1)  (1-1) 
Uj  ■  U3 

(1)  (1-1) 
Vi  -  v3 

,H)  o-l) 

U2  ■  U4 

O)  0-1) 

v2  -  v4 

(21) 

One  can  write 

UT  > 

(u‘*» 

(1)  (l)  (l) 

u2  u3  U4 

UJ2>  ul21 . 

•  uf  >  U<L,| 

<i 
— i 

• 

(vi11 

v(1)  v(1)  v(l) 

v2  v3  v4 

V<2'  vl2'.... 

•  v<11  v«L>, 

(22) 

Finally,  [P]  and  [S]  are  NxN  Matrices  with  N  ■  2L+2.  Since  6v  Is  arbitrary, 
eigenvalue  problee  reduces  to 

the 

Ut|a*(PJ  ♦ 

(S)  }  ■  0 

(23) 

which  Is  solved  for  the  eigenvalues. 


V.  CONCLUSIONS  AND  DISCUSSION.  In  this  paper,  we  have  Included  rotation 
about  the  longitudinal  axis  In  the  dynaelc  stability  study  of  a  free-flying 
eisslle  subjected  to  axial  thrusts.  It  Is  assuoed  that  the  Motions  of  bending 
and  the  thrust  are  In  the  saMe  plane.  In  the  differential  equation,  the  only 
difference  resulting  froM  the  Introduction  of  rotation  is  a  change  in  the 
frequency  paraaeter  A*  to 


A*  ■  A*  ♦  w«  (24) 

where  u  Is  the  rotation.  Consequently,  all  the  stability  curves  obtained  pre¬ 
viously  [1]  can  be  used  with  soMe  slaple  Modifications.  It  should  be  noted  that 
in  Reference  [1],  we  have  written  (with  w  ■  0) 

A  -  A  ■  Ar  ♦  1Aj  (25) 

and  the  stability  character  of  the  probleM  Is  Indicated  by:  (1)  stable  vibra¬ 
tions  ■  Aj  a  0,  Ar  ■  0;  (2)  unstable  by  buckling  (divergence)  *  Ar  a  0,  Aj  ■  0; 
(3)  unstable  by  flutter  ■  Ar  a  0,  Aj  a  0;  and  (4)  Marginally  stable  ■  Aj  ■  Ar  ■ 
0. 
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For  the  present  case,  the  stability  behavior  Is  Indicated  as  above,  but 
with  Aj  and  Ar  replacing  Aj  and  Ar  in  the  previous  stability  curves 

A  ■  Ar  +  1Ai  (26) 


and 


A*  *  (AR+iAi)«  =  A*  +  W*  =  (Ar+IAj ) *  +  w*  (27) 

or 

A*  *  (Ar+IA!)*  =  A*  -  «*  *  (AR+iAl)*  -  w*  (28) 

From  Eq.  (28),  when  Ar  ■  0,  A*  ■  -Ai*  -  w*,  hence  Ar  »  0  and  Aj*  ■  Ai*«Ki>*. 

Thus,  originally  stable  vibrations  will  remain  stable  with  higher  vibration  fre¬ 
quency.  On  the  other  hand,  when  Aj  *  0,  A*  *  Ar*  -  <■>*,  hence  A*  ■  Ar*  -  «*. 
Thus,  originally  divergent  motions  will  become  stable  vibrations  when  Ar*  <  w*. 
In  the  case  of  marginal  stability  A  =  0  will  certainly  be  stabilized  since  Aj*  » 
u*. 


In  the  case  of  flutter  instability,  Eq.  (28)  states  that  A  is  complex  (Aj 
a  0,  Ar  a  0)  if  and  only  if  A  is  complex  (Ai  #0,  Ar  *  0).  Therefore,  the 
flutter  instability  is  not  effected  by  the  introduction  of  the  rotation,  which 
is  an  interesting  observation. 

Several  demonstrative  stability  curves  with  A*  (and  A*)  versus  Q/n*  are 
shown  in  Figures  2  through  5.  Only  the  lowest  eigenvalue's  branches  are  shown, 
since  they  are  the  ones  which  dictate  the  stability  behavior.  Figure  2  shows 
the  two  lowest  stable  vibration  modes  and  two  rigid  body  modes  on  the  A*  ■  0 
axis.  This  is  the  case  of  a  free-flying  missile  with  a  follower  thrust  (Kq  ■  0) 
and  with  a  dimensionless  rotation  of  u*  *  500.  The  two  fluxural  modes  coalesce 
at  load  Q/n*  Vil.18  beyond  which  flutter  instability  begins.  The  rigid  body 
modes  without  rotation  indicate  marginal  stability.  Due  to  the  rotation  w,  the 
axis  is  shifted  from  A*  »  0  to  A*  «  0,  therefore,  these  previously  rigid  body 
modes  are  now  stable  modes  of  vibrations.  The  thrust  that  is  controlled  with  a 
small  negative  tangency  (Kg  *=  -0.05)  is  shown  in  Figure  3.  Ii  is  noted  in  this 
figure  that  the  divergence  instability  without  rotation  is  stabilized  by  «*  « 
500.  However,  the  new  critical  load  is  lowered  from  Q/n*  «  11.18  to  5.30,  not 
because  of  u*,  but  due  to  the  negative  control  parameter  Kq.  Figure  4  shows  the 
case  of  Kq  =  -l  or  that  the  thrust  has  a  fixed  direction  of  the  inertia  axis. 

It  is  clear  that  the  divergence  instability  of  the  lowest  branch  is  stabilized 
so  that  the  critical  load  has  been  raised  from  zero  to  Qqr  ■  1.50  n*.  Finally, 
the  case  for  a  small  positive  tangency  control  parameter  (Kq  ■  0.05)  is  shown  in 
Figure  5.  In  this  figure,  the  original  divergence  instability  at  Q/n*  ■  3.00  is 
stabilized  by  w*.  However,  the  original  critical  load  of  flutter  instability  at 
Q/n*  ■  9.90  is  not  changed  by  the  rotation.  Hence,  the  critical  load  in  this 
case  is  raised  from  3.00  to  9.90  due  to  the  rotation  of  w*  »  500. 
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FIGURE  1.  GEOMETRY  OF  THE  PROBLEM 
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FIGURE  4.  CRITICAL  LOAD  PLOT  POR  I 
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abstract.  A  study  is  made  of  planar  detonation  wave  birth  and 
evolution  in  a  reacting  gas  mixture.  The  chemistry  is  described  by  the 
high  activation  energy  global  reaction  A-B.  A  prescribed  heat  flux, 
applied  at  a  planar  boundary,  is  used  to  initiate  the  thermomechanical 
processes  which  result  in  detonation.  Finite  difference  methods  are  used 
to  solve  the  one-dimensional,  compressible,  unsteady  describing  equations 
which  include  reaction  effects  and  transport  terms.  Early  power  deposi¬ 
tion  at  the  boundary  heats  an  adjacent  thin  layer  of  gas  in  which  signifi¬ 
cant  chemical  heat  release  occurs.  The  total  power  deposited  generates 
thermomechanical  effects  which  cause  a  fully  resolved  shock  to  propagate 
away  from  the  boundary.  The  shock  conditions  unreacted  gas  and 
thereby  initiates  a  reaction  process  that  propagates  with  a  speed  similar  in 
magnitude  to  the  shock.  The  resulting  chemical  power  deposition 

enhances  the  shock  strength,  which  in  turn  accelerates  the  reaction  pro¬ 
cess  further.  The  total  rate  of  chemical  energy  release  increases  relent¬ 
lessly  evolving  suddenly  into  a  power  pulse  nearly  100-times  larger  that 
the  initial  boundary  heat  flux.  This  explosive  process  leads  to  the  for¬ 
mation  of  a  coupled  shock-reaction  zone  structure  that  propagates  as  an 
identificable  entity. 

I.  introduction.  The  work  presented  at  the  Fourth  Army 
Conference  on  Applied  Mathematics  and  Computing  was  derived  from 
Clarke  et  al.  (1).  A  complete  copy  of  this  manuscript  is  available  upon 
request  to  the  first  author.  A  summary  of  the  key  results  is  given  in 
the  next  section. 

II.  summary.  An  extension  of  Clarke  et  al.  (2)  to  a  reactive  gas 
mixture  provides  a  basis  for  studying  the  transient  development  of  a 
planar  detonation  consisting  of  a  fully  resolved  lead  shock  followed  by  a 
reaction  zone.  The  detonation  propagates  away  from  a  planar  confining 
boundary  leaving  behind  a  hot,  reactant  depleted,  zone  in  which  a  vari¬ 
ety  of  weak  gasdynamic  waves  can  be  observed. 

The  mathematical  model  is  based  on  the  equations  for  a  reactive 
compressible,  perfect  gas  with  transport  effects.  initially  the  gas  is  at 
rest,  at  a  temperature  of  300K  and  a  pressure  of  I.OlxlO’Pa.  A  heat  flux 
of  10*  W/m’  is  applied  on  the  planar  boundary  during  a  period  of 
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O(10'9s).  Solutions  obtained  from  an  implicit  finite  difference  calculation 

show  that  on  a  time-scale  of  O(10"*s)  there  is  a  ten-fold  increase  in 
global  power  deposition  associated  with  chemical  reaction  induced  by 

shock  passage.  During  an  ensuing  transitional  period  of  O(10'Ts)  the 
reaction  rate  is  enhanced  gradually  and  the  shock  is  strengthened  accord¬ 
ingly.  The  propagation  speed  of  the  shock  is  larger  than  that  of  the 

clearly  indentifiable  reaction  zone  although  both  move  at  supersonic 
speed  relative  to  the  undisturbed  gas.  The  transition  period  is  termi¬ 
nated  dramatically  by  a  rapidly  developing  burst  of  power  deposition  of 
magnitude  lO10  W/mJ.  This  rapid  reaction  process  develops  during  a 

period  of  about  2xl0'*s.  and  in  a  region  5x10'*  cm  in  extent.  The 
explosion  is  localized  sufficiently  in  space  and  time  to  ensure  that  an  in- 
ertially  dominated  heat-addition  process  occurs.  It  follows  that  the  tem¬ 
perature  rise  is  associated  with  an  enormous  pressure  increase,  some  50 
times  the  initial  value  of  1  atmosphere.  Localized  reactant  depletion  ter¬ 
minates  the  explosive  process  ana  the  global  power  deposition  declines. 

Further  shock  strengthening,  resulting  from  compression  waves  generated 
by  the  pressure  buildup,  leads  to  a  significant  reduction  in  the  ignition 

delay  time.  As  a  result,  the  reaction  zone  Mach  number  accelerates  rap¬ 
idly  up  to  that  of  the  shock  and  the  two  structures  propagate  away  tog¬ 
ether  like  a  ZND-wave.  The  reaction  zone  structure  itself  is  not  unlike 
that  described  by  Kassoy  and  Clarke  (3). 

In  conclusion,  our  calculations  show  that  direct  initiation  of  deto¬ 
nation  requires  sufficent  power  input  to  first  of  all  generate  a  suitable 
strong  precursor  shock  wave,  which  then  becomes  the  trigger  to  switch 
on  vigorous  chemical  activity  in  its  wake.  The  hallmark  of  this  vigor  is 
its  capacity  to  exploit  the  inertia  of  the  fluid  by  raising  local  pressures 
and  temperatures,  with  little  diminution  in  local  density;  the  pressure 
waves  so  formed  propagate  and  increase  precursor  shock  strength,  which 
therefore  lifts  overall  density  levels,  as  well  as  those  of  pressure  and 
temperature.  All  of  these  processes  interlock  in  a  continuously  acceler¬ 
ated  sequence  that  progresses  toward  a  steady  state  in  the  shape  of  a 
ZND  detonation. 

Although  we  have  restricted  our  attentions  here  to  "direct  initia¬ 
tion",  as  it  is  called,  it  must  be  said  that  it  is  difficult  to  imagine  any 
other  sequence  of  events  over  longer  times  provided  that  the  precursor 
shock  strength  is  not  allowed,  or  forced,  to  decay  in  the  transitional  time 
domain.  It  is  here  that  intial  input  energy,  as  opposed  to  power,  can 
have  an  important  part  to  play  in  preventing  shock  decay,  by  geometric 
attenuation  for  example  in  two  and  three  dimensions.  In  the  one¬ 
dimensional  geometry  of  the  present  study  calculations  carried  out  for 
finite  switch-off  times  oq  of  boundary  heat  flux  show  that  the  initiation 
process  is  only  delayed  oy  reductions  of  »q,  and  hence,  of  input  energy. 
There  is  no  suggestion  from  our  calculations  that  the  "formation"  and 
"ZND-like"  events  will  not  always  eventually  follow  the  "transitional" 
ones.  Thus,  in  one-dimension  our  calculations  imply  that  initiation  of 
detonation  will  always  follow  deposition  of  power,  no  matter  how  little 
the  energy.  Of  course,  this  implies  in  its  turn  that  unlimited  distance  is 
available  for  the  precursor  shock  to  travel.  In  reality  this  distance  is 
always  limited  and  so,  as  power  diminishes,  we  would  expect  energy  to 
rise  in  monotone  fashion  for  detonation  initiation  to  take  place  witnin  a 
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given  configuration.  This  is  just  what  Dabora  (4)  finds  in  his  experi¬ 
ments  with  hydrogen-oxygen-nitrogen  mixtures  in  a  shock  tube.  We 
remark  that  Dabora’s  boundary  input  power  levels  range  from  2x10’  to 
10’  MW/m1,  which  is  precisely  in  the  range  of  values  that  we  have  stu¬ 
died. 
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ABSTRACT 


The  interaction  between  the  rotating  band  and  the  rifling  is  usually  used 
to  provide  a  desired  angular  acceleration  or  spin  to  a  projectile  as  the 
projectile  accelerates  along  the  length  of  the  barrel  of  a  gun.  Evaluations 
and  improvements  of  the  design  of  rotating  bands  need  an  understanding  of 
the  stresses  and  deformations  of  the  band  resulting  from  the  interaction.  An 
accurate  knowledge  of  these  requires  a  three  dimensional  solution  of  the  problem. 
However,  significant  amount  of  the  mechanics  of  interaction  can  be  understood 
by  studying  an  idealized  two  dimensional  problem  that  considers  only  the 
circumferential  flew  of  the  rotating  band  material  into  the  rifling  groove. 

In  this  report,  a  numerical  solution  procedure  has  been  discussed  for  a  two 
dimensional  problem.  The  circumferential  flow  has  been  discussed  for  specific 
cases. 

*This  paper  was  presented  at  the  Third  Army  conference  on  Applied  Mathematics 
and  Computing. 
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I.  INTRODUCTION. 


The  interaction,  between  the  rotating  band  and  the  rifling,  is  usually 
used  to  provide  a  desired  spin  or  angular  acceleration  to  a  projectile  as  the 
projectile  accelerates  along  the  length  of  the  barrel  of  a  gun.  This  interac¬ 
tion  takes  place  in  the  following  way.  Before  entering  the  barrel,  a  fired 
projectile  enters  a  "forcing  cone".  The  outer  diameter  of  the  projectile  is 
usually  designed  to  be  smaller  than  the  minor  diameter  of  the  bore.  However, 
the  outer  diameter  of  the  rotating  band  which  is  located  on  the  outer  hard 
surface  of  the  projectile  is  larger  than  the  minor  diameter  of  the  bore  (See 
Fig.  1)  As  a  result,  the  radial  dimensions  of  the  rotating  band  are  reduced 
at  first  in  the  forcing  cone  and  then  in  the  barrel. 

Inside  the  forcing  cone,  the  reduction  in  the  radial  dimensions  of  the 
rotating  band  is  accompanied  by  an  axial  flow  along  the  length  of  the  projec¬ 
tile.  The  deformation  process  is  axisymmetric  since  the  forcing  cone  has  a 
smooth  wall.  Upon  encountering  the  rifling  in  the  barrel,  radial  flow,  axial 
flow  along  the  length  of  the  projectile  and  circumferential  flow  into  the 
rifling  groove  occur.  The  deformation  becomes  non-axisymnetric  and  hence  a 
three  dimensional  description  is  necessary.  It  is  to  be  noted  here  that  the 
rifling  has  a  twist  along  the  length  of  the  barrel.  It  is  the  combination  of 
the  aforementioned  deformation  of  the  rotating  band  and  the  designed  twist  of 
the  rifling  along  the  length  of  the  barrel  that  result  in  an  angular  accelera¬ 
tion  being  imparted  to  the  projectile  as  the  projectile  travels  along  the 
length  of  the  barrel. 

The  functions  of  a  rotating  band  impose  certain  basic  requirements  on  the 
band  material  and  geometry.  The  rotating  band  must  have  sufficient  "plastic 
flew  characteristics"  to  deform  from  its  initial  configuration  to  a  shape 
dictated  by  the  forcing  cone  and  the  rifling  in  order  to  reduce  the  wear  of  the 
barrel.  At  the  same  time,  must  also  have  sufficient  strength  to 

i 

(a)  transnit  the  torque  to  the  projectile, 

(b)  withstand  the  propellant  gas  pressure,  and 

(c)  provide  the  required  attachment  characteristics  to 
the  projectile. 
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At  a  conaaquanca  of  these  requirements,  evaluations  and  Improvements  of  the 
design  of  rotating  bands  need  an  understanding  of  the  stresses  and  deformations 
of  the  band  which  should  be  obtained  by  Including  In  the  analysis  the  following 
parameters: 

(a)  flow  strength  of  the  rotating  band  material, 

(b)  breathing  strength  of  the  projectile  wall, 

(c)  depth  of  the  rotating  band,  and 

(d)  the  Interference. 

The  results  of  a  preliminary  study  of  the  stresses  and  deformations  In  the 
rotating  band  of  an  artillery  projectile  caused  by  the  Interaction  of  the  band 
and  the  rifling  Is  presented  In  this  report.  The  results  of  the  Interaction  of 
the  rotating  band  and  the  forcing  cone  have  been  examined  by  other  Investiga¬ 
tors  as  a  one  dimensional  problem  [1]. 

II.  DESCRIPTION  OF  ANALYTICAL  MODEL. 

As  discussed  before,  the  Interaction  of  the  rotating  band  and  rifling 
causes  the  band  material  to  flow  In  three  directions.  Accurate  determination 
of  the  corresponding  states  of  stresses  and  deformations  In  the  band  requires  a 
three  dimensional,  elastic-plastic,  large  deformation  dynamic  analysis.  In  the 
present  study,  a  simplified  two  dimensional  problem  with  only  radial  and 
circumferential  flow  Is  analyzed.  Since  the  distance  between  two  neighboring 
rifling  grooves  Is  sufficiently  small  compared  to  the  radius  of  the  projectile, 
the  effect  of  the  curvature  can  be  neglected.  As  a  result,  the  simplified 
problem  can  be  considered  as  a  two  dimensional  plane  strain  problem  as  showr  In 
Fig.  2.  The  gun  barrel  Is  assumed  to  move  along  the  y  direction  Into  the 
rotating  band.  The  Initial  velocity  of  the  gun  barrel  Is  determined  on  the 
basis  of  the  longitudinal  velocity  of  the  projectile  and  the  amount  of  Inter¬ 
ference  between  rotating  band  and  the  gun  barrel. 

In  general,  the  rotating  band  material  Is  much  softer  than  that  of  the  gun 
barrel  and  the  projectile  resulting  In  deformations.  The  following  assumptions 
are  made  In  the  first  stage  of  the  analysis: 

(a)  Gun  barrel  and  projectile  are  both  rigid, 

(b)  the  rotating  band  material  Is  modeled  by  an  elastic-perfect 
plastic  material  and  strain  rate  effect  Is  absent, 

(c)  perfect  bonding  exists  between  the  rotating  band  and  the 
projectile. 

III.  GOVERNING  EQUATIONS 

kn  the  analysis,  c,  n  are  used  to  denote  the  undeformed  coordinates. 
After  the  body  deforms,  x.y,  U,n,t)  are  used  to  denote  the  deformed  configura¬ 
tion  of  the  particle  at  time  t  whose  Initial  coordinates  are  ((,  n)>  The 
governing  equations  are  then  written  In  terms  of  Cauchy  stresses  and  Lagranglan 
coordinates.  These  equations,  for  a  two-dimensional  plane  strain  problem,  are 
as  follows: 

Kinematics  equations: 

x  «  u,  y  «  v  (1) 
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Equations  of  motion: 


,  da  .  3t 

u  *  I _ xx  +  I  -M 

p  3x  p  3y 


,  3t  ,  9o 

=  I  __M  +  I 

p  3x  p  3y 


Constitutive  equations: 


*  _  /  dp  ,  4  r « 3u  ,  ,  dp  2.,  3v  .r  w,3u  3v. 

°xx  -  <%  +  3  g)H  +  (p5t  ■  361  a?  +  *y(aJ 


•  _  /  dp  2  r»  3u  ,  ,  dp  .  4  r.  3v  ,  ,3V  3Ut 

°yy  ~  p  dp  "  3  G  ax  +  pdp  +  3  G  ay  +  'xy(a7  *  aj> 


*  _r  3v  .  -  3u  .  1,  _  W3V  3uv 

Txy  ~  3x  G  3y  2  ^ffxx  "  ffyy^3x  ”  3y> 


a  =  v(<r  +  <r  ) 

zz  '  xx  yy' 


Continuity  equation: 


(2) 


(3) 

(4) 

(5) 

(6) 


p  = 


3U  _  3v 
3x  p  3y 


(7) 


It  is  assumed  that  stress  deviators  and  pressure  can  be  defined  as  follows: 


where 


s 

=  p  +  0 

XX 

S 

=  p  +  0 

yy 

V  yy 

s 

=  -S  -  S 

zz 

XX 

P  = 

'  5(oxx  + 

(8) 


(9) 


The  von  Mises  yield  condition  for  elastic-perfect  plastic  material  is  given  by: 


xx 


+  S  ♦  S 

yy 


zz 


+  2  x  ^  S  - 
c  Txy  a  3  T 


(10) 
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u:  velocity  In  x-dlrectlon 

\J:  velocity  In  y-dlrectlon 

°xx:  normal  stress  In  x-dlrectlon 

o  :  normal  stress  In  y-dlrectlon 

°zz:  normal  stress  In  z-dlrectlon 

t  :  shear  stress 

xy 

Sxx:  normal  stress  deviator  In  x-dlrectlon 

Syy:  normal  stress  deviator  In  y-dlrectlon 

S22:  normal  stress  deviator  In  z-dlrectlon 

p:  density 

p:  hydrostatic  pressure 

G:  modulus  of  rigidity 

Y:  yield  stress  In  simple  tension 

Equations  (1)  to  (5)  and  (7)  are  written  In  the  following  compact  form 

{u}  =  [A]  {u},x  4  [B]  {u},y  (11) 

where 

{u)T  *  {u,  v,  oxx,  oyy,  txy,  p}.T 

The  matrices  A  and  B  are  not  constants  In  the  finite  deformation  problem  of 
elastic-plastic  materials  and  are  given  as  follows: 


[A] 


0 

0 


d£ 

dp 


+  5g 


o 

0 


1/p 

0 

0 


p*-f«  T«y  0 

0  G  *  r(oxx  •  °yy>  ° 

-p  0  0 


0  0  0 

0  1/p  0 

0  0  0 

0  0  0 

0  0  0 

0  0  0 


1317 


[B]  = 


0 

0 


xy 

”T 


xy 


G  ‘  2<°xx 


<■  a? " ! G 

app*  1 6 


0 

0 


1/p 

0 


1/p 

0 

0 


0 
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V 


0 
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From  the  assumed  two  dimensional  plane  strain  case,  the  boundary  conditions  of 
a  typical  part  of  rotating  band  as  shown  In  Figure  3  are  as  follows: 


0  s  n  *  R. 
Txy  *  °' 


o  s  n  s  R. 


Txy  *  °' 


On  the  boundary 
£  =  0, 
u  -  0, 

On  the  boundary 
£  -  L, 
u  =  0, 

On  the  boundary 

0  s  t  s  L, 
u  =  v  =  0 

On  the  boundary  0  L  and  n  =  R,  If  x(e,  n.t)  S  , 

V  =  at  .  Txy  *  0 

where  a  Is  the  constant  acceleration.  If  x(c,  n.  t)  >  L ^ 


n  =  0 


and 

where 


°xx  nl  +  Txyn2 

Txynl  +  °yyr2 
n  *  n^T  +  n2J 


=  0 

=  0 
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Is  the  unit  normal  to  the  deformed  surface,  on  the  boundary  t  i  L,  n  E  R. 
If  Lj  i  x  U,  n»  t)  i  L  and  y  (c,  n,  t)  s  R  -  1/2  at2 


Vl  +  Txyn2  =  0 
T*y"l  +  °yyn2  =  0 

IV.  SOLUTION  TECHNIQUE 

The  problem  described  In  Section- III  Includes  material  nonlinearity  and 
geometric  nonlinearity.  It  Is  easily  seen  that  analytical  solutions,  for  such 
a  transient  dynamic  response  problem  with  finite  deformation  and  elastic- 
aplastic  material  are  very  difficult  to  obtain.  Therefore,  suitable  numerical 
methods  have  to  be  used  to  solve  the  problem.  Finite  difference  techniques 
based  on  the  Lax-Wendroff  scheme  [2]  and  the  modified  version  of  Strang's 
method  [3,4]  due  to  Morris  and  Gottlieb  [5,  6]  are  used  for  the  study  of  the 
transient  dynamic  response  of  elastic-plastic  solids  under  conditions  of  finite 
deformations.  Since  the  finite  deformation  formulation  used  In  the  present 
problem  Is  based  on  a  Cauchy  stress  formulation  and  updated  Lagranglan  ap¬ 
proach,  the  finite  difference  meshes  may  distort  with  Increasing  time.  Thus, 
the  conventional  finite  difference  schemes  for  spatial  derivatives  In  which  the 
meshes  are  fixed  for  all  time  are  no  longer  suitable.  A  second  order  accurate 
numerical  technique  based  on  the  Lax-Wendroff  scheme,  the  modified  Strang 
method  and  the  contour  MacCormack  two-step  procedure  has  been  developed  for  use 
with  deformable  Lagranglan  meshes  at  Georgia  Institute  of  Technology  [7,8]. 
This  modified  scheme  Is  efficient  and  also  Is  suitable  for  deformed  meshes. 
Therefore,  it  has  been  used  to  solve  the  rotating  band  and  rifling  Interaction 
problem  described  In  Section  III.  The  detailed  formulation  Is  given  In  [7,8]. 

The  modified  Strang's  method  along  with  the  Lax-Wendroff  scheme  enables 
one  to  write  the  solution  of  Eq.  (11)  as  follows: 

{u}1  +  4t  *  Lyl.xLxLy{u}t  (12) 

where  L  ,  L  are  the  one-dimensional  Lax-Wendroff  operators,  which  are  defined 
a  y 

as: 

LX{U}‘  *  {u}‘  +  4t[A]  {«}*.„  +  i  It2|  CA]2{u)t,xx  (13) 

♦  [A][A],u{u>t,x  +  [A]M.[A]t»)t.x  |{ujt,x  (14) 

Ly  {U)‘  =  (u)1  ♦  4tCB](u)‘y  ♦  i  4t  2|[B32{u)Tyy 
♦  [6][B],u(u)‘y+  [B],u[B]{u}‘y  (,}*y. 
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In  using  the  procedure  described  In  equations  (12)  to  (14),  It  Is  neces¬ 
sary  to  compute  the  products  of  matrices,  derivatives  of  matrices,  and  finite 
difference  approximations  for  second  order  partial  derivatives  at  each  step  of 
numerical  Integration.  These  computations  can  be  time  consuming  when  the 
matrices  become  large.  But  they  can  be  eliminated  by  using  a  MacCormack 
two-step  procedure  [9]  which  consists  of  the  successive  application  of  two 
first  order  accurate  scheme  to  achieve  a  second  order  accuracy.  The  steps 
Involved  In  calculating  Lx  are  as  follows: 

{u}‘  =  {u)‘  +  it  [A]{u}t,x  (15) 

Lx{u}t  =  1/2  ({u}t  +  {u}*)  +  it/2  [A]*  {u}*,x  (16) 

* 

where  [A]*  Is  evaluated  by  using  the  value  of  {u}.  A  similar  expression  can 
be  written  for  L  In  the  y  direction.  For  the  purpose  of  stability  and  accura¬ 
cy,  It  Is  necessary  to  calculate  the  spatial  derivatives  In  (15)  by  a  forward 
difference  In  the  predicted  step  and  that  In  (16)  by  a  backward  difference  In 
the  corrected  step  or  vice  versa.  As  discussed  earlier.  In  a  problem  with 
finite  deformations,  the  initially  regular  meshes  distort  with  Increasing  time, 
thus  the  contour  difference  forms  for  calculating  the  spatial  derivatives  are 
necessary.  Specifically,  for  the  present  two-step  method,  this  requires  the 
use  of  contour  finite  differences  of  backward  and  forward  types  with  a  second 
order  accuracy.  Such  a  numerical  technique  has  been  developed  [7,8],  By  using 
the  MacCormack's  two-step  procedure  (15),  (16)  and  the  modified  Strang's  method 
(12),  a  finite  difference  scheme  for  the  solution  of  equation  (11)  can  be 
written  as  follows: 

{V)(1)  s  {u)t  +  At[B]&y{u)t 

VU>t=  tV>(2)  =  I  ({“}t  *  <V}(1)>  ♦ 1  tB](l)VV)(l) 


(V)(3)  =  +  At[A]^2j  AX{V} (g)  (17) 

LxL,{u}t  *  {v}(4)  *  I  «W>(2)  ♦  W(l)»  +  ~Z  W(3)  \<V»(3) 

,V)(5)  *  tV,<4)  +  it[A](4)  VV)(4) 

LxlxLy{u}t  *  {V}(6)  *  I  (,V)(4)  +  W(5)>  +  H  W(5)'x(V)(5) 

W(7)‘W(«)  +4t[B](6)  VV,(6) 

LyLxLxLy<''lt  *  {W,(S)'  f<{V>(6>  +  {V>(7))  +  ^  [B3(1)  VW,(7) 

(u)tt2lt  =  {V)(B) 
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Th*  operators  t  .  A  are  forward  contour  differences.  Similarly,  v  and  v  are 
backward  contouf  differences.  The  form  of  a  {u}4  4  s  {uU4,  Is  y 
written  In  the  following  way.  x  x  '  J 


Ax{uJ1,j  B  puJl4i,J  ’  {u}i,j)(y1,j+l  ‘  y1,J-l)  “  ({u*1,J*l  *  tu)1,j-i>- 

(yHu  ■  yuG /C(xi+u  “  xi.j,(yij4i  *  yi,M)  *  (xi,m  -xi,j-i>- 
(yHl.j’y1jG  (18) 


The  set  of  finite  difference  equations  (17)  are  valid  for  any  Interior 
point.  A  slightly  different  version  Is  needed  for  points  on  the  boundaries 
[10]. 

V.  NUMERICAL  RESULTS  AND  DISCUSSIONS. 

The  numerical  procedure  described  In  Section  IV  has  been  employed  to  solve 
a  specific  two-dimensional  plane  strain  problem  simulating  the  radial  and 
circumferential  flow  of  the  band  material  Into  rifling  grooves  as  a  result  of 
Its  interaction  with  the  rifling.  The  geometry  of  the  undeformed  rotating  band 
and  the  physical  properties  of  the  band  material  are  listed  below: 

(a)  geometry  (ref.  to  Fig.  3) 

L  *  0.2  In  (0.508  cm) 

L.  *  0.075  In  (2.032  cm) 

R1  *  0.1  In  (0.254  cm) 

R  *  0.15  In  (0.381  cm) 

(b)  physical  properties  (copper) 

Density,  p=8.941  grms/cc  7 

Young's  Modulus,  E  *  1.6  x  10'  psl  (110320  MPa) 

Poisson's  Ratio,  v  *  0.32  - 

Yield  stress  In  simple  tension,  Y*4.5  x  10*  psl 

(310275MPa) 

The  Initial  configuration  of  grid  points  employed  In  the  finite  difference 
discretization  Is  shown  In  Fig.  4.  It  contains  the  region  bounded  by  the  edges 
at  xsO,  L  and  y&0,R.  There  are  200  cells  In  the  region  and  the  dimensions  of 
these  cells  are  x*y*0.01  In  (.0254  cm). 

In  the  numerical  analysis,  two  distinct  cases  have  been  considered.  In 
the  first  case,  the  gun  barrel  Is  assumed  to  move  Into  the  rotating  band  at  a 
constant  velocity  of  O.Ol.in/p  sec  while  In  the  second  case  at  a  constant 
acceleration  of  5.22  x  10  1n/y  sec.  In  the  numerical  computations  the  CFL 
(Courant-Frledrlchs-Levy)  number  Is  chosen  to  be  0.98  for  the  finite  difference 
scheme. 
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The  results  of  the  first  case  are  depicted  In  Figures  5  and  6  with  plots 
of  the  deformed  grid  configuration,  the  velocity  field,  and  the  stress  field  at 
times  2.08  y  sec  and  2.79  y  sec.  The  results  Indicate  a  strong  circumferential 
flow  of  the  band  material  into  the  rifling  groove.  Both  the  flow  and  the 
stress  fields  Indicate  that  the  flow  Initiates  from  the  top  surface  of  the 
band.  Similar  results  for  the  second  case  are  shown  In  Figs.  7  and  8. 

The  velocity  and  stress  fields  In  the  deformed  configurations  obtained  In 
the  two  cases  exhibit  the  behavior  or  trend  expected  In  the  Interaction  between 
rotating  band  and  the  rifling  In  the  gun  barrel.  The  preliminary  results  In 
the  study  demonstrate  that  the  numerical  scheme  described  In  this  paper  can  be 
used  to  analyze  the  rotating  band  and  rifling  Interaction  problem  Involving 
plastic  flow  and  large  deformations.  Future  efforts  will  be  devoted  to  study¬ 
ing  the  Influence  of  the  parameters  such  as  the  flow  strength  of  the  rotating 
band  material,  the  stiffness  of  the  gun  barrel  and  the  projectile,  the  dimen¬ 
sions  of  the  rotating  band,  the  frictions  at  the  contact  surface  of  the  rotat¬ 
ing  band  and  the  gun  barrel  on  the  stress  and  deformations  of  the  band. 


Since  the  numerical  procedure  used  In  the  present  analysis  Is  based  on  an 
explicit  finite  difference  scheme,  the  selection  of  the  spatial  and  temporal 


increment  sizes  must  satisfy  the  stability  condition  max|  (A) I 
to  a  restriction  on  the  size  of  the  time  steps.  The  computations 
of  the  present  procedure  can  be  Improved  by  considering  the  use  o 
"explicit-implicit  method".  Such  a  combination  has  the  benefits  of  both 
methods  and  will  be  Investigated  In  future  studies. 


This  leads 
efficiency 
a  hybrid 
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Forcing  Cone 


Figure  1.  Schematic  of  an  Artillery  Projectile  in 
the  Gun  Barrel 


Figure  2.  A  Typical  Section 
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Figure  3.  Boundary  Conditions  on  Typical  Section 
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Figure  7*.  Deformed  Configuration 


CYCLE-200 
TIME- 11. 45 n sec 


yy///// 


\  \  ^  /  ,  .  . 


\  S\  . 

\  \  \ 

'  ^  N  ^  ^ 
\  \  S  s  — 


s  s 


^  ^  ^  ^  x 


¥  -  —  “ 


A 


1597.44 

(ln/sec) 


Figure  7b.  Velocity  Field 


329 


\ 


CYCLE  -  200 
TIME*  11. 45  Usee 


IMM 

!■■■■ 


279198.20 


Figure  7c.  Stress  Field 


CYCLE-400 


iBKiSSSSSSSSiSi 
■gMgfKKi; iiiii; 

mSS""SSSSSSSnamm 

!■■■■■»■■■  ■  ■  ■  ■  ■ 

l»iii 


Figure  8a.  Deformed  Configuration 


Figure  8b.  Velocity  Field 


Figure  8c.  Stress  Field 
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USING  SUPERCOMPUTERS  TODAY  AND  TOMORROW* 


John  R.  Rice 

Computer  Science  Department 
Purdue  University 
West  Lafayette,  Indiana  47907 

ABSTRACT.  The  past  and  future  growth  of  supercomputer  power  is  summarized  along 
with  the  changes  in  modes  of  accessing  and  using  supercomputers.  Three  particular 
applications  are  considered  from  1983,  1985  and  1995  (hypothetical).  It  is  argued  that 
the  software  and  peripheral  support  for  supercomputers  has  fallen  far  behind  the  increase 
in  power.  Solutions  to  the  access  and  software  support  are  discussed;  it  is  concluded  that 
the  access  problem  is  both  difficult  and  very  expensive  to  solve  while  the  software  sup¬ 
port  problem  is  difficult  and  only  moderately  expensive  to  solve. 


I.  SUPERCOMPUTER  POWER.  First,  we  briefly  review  the  growth  in  supercom¬ 
puter  power.  The  peak  performance  grew  slightly  less  than  exponentially  over  the  1965 
to  1980  period.  This  growth  masks  the  fact  that  the  typical  scientist  and  engineer  outside 
a  few  weapons  laboratory  experienced  very  little  growth  in  the  power  of  the  computers 
available  to  them.  The  prices  came  down  but  the  power  did  not  grow  dramatically. 
Supercomputer  power  growth  has  accelerated  since  1980  and  even  more  acceleration  is 
forecast  for  the  next  10  years.  This  growth  is  summarized  in  Table  1. 

Table  1.  Some  trends  in  scientific  supcrcomputing 


Year/Machine  Speed  Speed  Increase 

10-year  20-year 


1966/CDC6600 
1975/CDC7600 
1980/Cray  1 
1985/Cyber  205 
1990/- 
1995/- 


1  MFLOPS 
4MFLOPS 
10  MFLOPS 

100  MFLOPS 

2  GFLOPS 
200  GFLOPS 


4 

5 

25  100 

200  1000 

2000  50,000 


The  projected  1995  machine  has  1000  processors  with  a  2  nanosecond  cycle  time. 

During  the  period  1965-1985  there  has  been  a  significant  change  in  access  to  com¬ 
puters.  Batch  processing  has  been  replaced  by  some  sort  of  terminal  access.  The  types 

♦The  author  of  this  paper  presented  It  at  the  Third  Army  Conference  on 
Applied  Mathematics  and  Computing 
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of  access  vary  considerably  and  are  listed  below  in  what  we  think  is  decreasing  fre¬ 
quency: 

1.  Terminal  attached  to  front  end  machine  connected  to  network  connected  to 
supercomputer 

2.  Terminal  attached  to  front  end  machine  connected  to  supercomputer 

3.  Workstation  attached  to  network  attached  to  front  end  machine  attached  to  net¬ 
work  attached  to  supercomputer 

4.  Terminal  attached  to  supercomputer 

There  are  other  types  of  access,  but  the  key  point  here  is  that  most  access  is  via  one  or 
more  layers  of  intermediate  machines  and  networks. 

One  of  the  barriers  in  effective  use  of  supercomputers  is  the  disparate  speeds  within 
the  intermediate  machines  and  networks.  Table  2  presents  data  on  the  transfer  rates  of 
these  facilities. 

Table  2.  Peak  and  effective  transfer  rates  of  various  facilities 
measured  in  bits  per  second  (K  =  1000,  M  =  1,000,000) 


Facility 

Peak  Rate 

Effective  Rate 

Telephone 

300 

300 

2400  baud  line 

2400 

2400 

9600  baud  line 

96C0 

9600 

ARPANET 

57K 

20K 

Bus  on  VAX  11/780 

1M 

160K 

Ethernet 

10M 

1.5M 

CDCLCN 

50M 

3M 

Cyber  205  channel 

200M 

100M 

It  is  easy  to  see  that  current  supercomputers  produce  results  at  rates  that  completely 
swamp  more  the  capacity  of  most  user’s  access  facilities. 

Figure  1  shows  the  current  configuration  of  the  Cyber  205  facility  at  Purdue  Univer¬ 
sity.  Most  users  access  the  Cyber  205  through  the  CDC6000  systems  or  by  long  haul 
networks  attached  to  a  VAX  1 1/780. 

While  progress  in  access  to  supercomputers  has  been  significant  (yet  modest  com¬ 
pared  to  the  progress  in  supercomputer  speed),  the  progress  in  programming  has  been 
uneven  and  even  in  reverse  for  some  areas.  Editors,  on-line  file  systems,  program  update 
systems,  libraries,  etc.  have  greatly  improved  the  environment  for  writing  programs. 
Programming  itself  has  gone  downhill.  In  1966  Fortran  IV  was  well  established.  In 
1986  supercomputer  users  can  use  Fortran  77  (a  small  improvement),  but,  if  you  want  to 
get  real  suptrer  mputcr  speeds,  you  must  use  machine  specific  Fortan  statements,  tricks 
and  generally  be  rather  knowledgeable  about  the  whole  machine  organization.  I  believe 
that  the  programming  task  itself  is  perhaps  40%  as  efficient  on  the  current  Cray  and  CDC 
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Figure  1.  Configuration  of  the  Purdue  high  speed  file  transfer  network  to  support  the 
Cyber  205  supercomputer. 

supercomputers  as  it  was  in  1965  (again,  this  is  for  achieving  something  close  to  the 
potential  of  the  machines).  Automatic  vectorizers  are  very  worthwhile,  but  they  also  fall 
far  short  of  providing  the  vectorization  possible. 

I  illustrate  this  development  in  programming  with  two  randomly  selected  examples. 
In  [DoEi84]  we  see  the  subprogram 
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SUBROUTINE  SMXPY  (Nl,  Y,  N2,  LDM,  X,  M) 

REAL  Y(*),  X(*),  M(LDM,  *) 

DO  20  J  =  1,  N2 
DO  101  =  1, Nl 
Y(I)  =  Y(I)  +  X(J)  *  M(I,J) 

10  CONTINUE 
20  CONTINUE 
RETURN 
END 

and  learn  that  Cray  users  should  learn  to  write  the  inner  loop  as 
DO  10 1  =  1,  Nl 

Y(I)  =  ((((Y (I)  +  X(J  -3)  *  M(I,  J  -  3))  +  X(J  -  2)  *  M(I,  J  -2)) 

$  +  X(J-1)*M(I,J-1))  +  X(J)*M(U) 

10  CONTINUE. 

On  a  Cyber  205  the  simple  computation 

FORALL  (I=1:NTDIM,  J=1:NSDIM)  SCORES(IJ)  =  60.  +  40.*SIN(I*J*63.2l) 


should  be  programmed  as 

SEQ(1;NSDIM)  =  Q8VINTL(  1 , 1  ;SEQ(1  ;NSDIM)) 

DO  100 1  =  1.NTDIM 

ARG(1;NSDIM)  =  1*63.2 1*SEQ(1;NSDIM) 

SEQ0(1;NSDIM)  =  VSIN(ARG(1;NSDIM);SEQ0(1;NSDIM)) 
SCORES(l,l  ;NSDIM)  =  60.  +  40.*SEQ0(1;NSDIM) 

100  CONTINUE 


in  order  to  achieve  high  speed  execution. 

We  conclude  that  the  software  and  peripheral  support  for  supercomputers  has  fallen 
far  behind  the  increase  in  supercomputer  computational  power. 

II.  THE  SIZE  OF  SUPERCOMPUTER  ANSWERS.  Everyone  in  the  supercomputer 
area  and  most  outside  it  visualize  that  supercomputer  applications  use  enormous  amounts 
of  computation,  millions  and  billions  and  trillions  of  arithmetic  steps.  Much  less  widely 
known  is  that  the  answers  produced  in  a  typical  supercomputer  applications  are  also 
huge.  By  answer,  we  mean  the  information  the  user  needs  in  order  to  understand  the 
computed  solution;  we  do  not  mean  the  total  set  of  numerical  results  computed,  which  is 
usually  very  much  larger.  We  illustrate  this  with  three  sample  applications,  two  real  ones 
from  1983  and  1985  and  one  hypothetical  one  from  1995. 

1983  APPLICATION:  The  high  speed  impact  of  two  steel  cubes  into  a  block  of 
aluminum.  This  computation  was  performed  at  Los  Alamos  [Los83]  on  a  Cray  1  and 
used  30  mintues  to  cover  2.5  microseconds  of  real  time.  The  problem  is  eight¬ 
dimensional  with  3  space  variables,  time  and  4  dependent  variable  (temperature, 
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pressure,  density  of  steel  and  density  of  aluminum). 

Thirty  minutes  of  Cray  time  represents  about  150  billion  instructions  (12 
nanosecond  cycle  time)  including  about  18  billion  arithmetic  operations  (10  MFLOPS). 
The  answer  can  be  represented  by  data  on  a  100  by  80  by  80  special  grid  for  150  time 
steps;  each  of  the  96  million  grid  points  has  4  values  (64  bits  long).  Thus  only  400  mil¬ 
lion  numbers  represent  the  result  of  18  billion  computed  numbers,  or  the  answer  requires 
only  4.5%  of  the  numbers  computed.  Some  nice  color  plots  are  given  in  [Los83]  and 
illustrate  the  effectiveness  of  this  medium  for  presenting  information  about  computed 
results.  Note  that  the  answer  is  3  gigabytes  in  size  which  is  close  to  the  size  of  the  entire 
disk  memory  space  on  many  large  scale  computer  systems. 

1985  APPLICATION:  Accretion  of  material  into  a  black  hole  (2D  model).  This 
computation  [Sm85]  shows  the  evolution  of  a  black  hole  over  a  period  of  millions  of 
years.  It  assumes  axial  symmetry  to  reduce  the  problem  to  a  feasible  size.  The  answer  is 
1.25  billion  numbers  (10  gigabytes).  The  author  discusses  how  to  view  the  results  using 
color  movies.  He  notes  that  his  computation  only  provides  moderate  resolution  in  time 
and  space  and  that  a  good  quality  movie  would  require  considerably  more  computation 
and  produce  a  considerably  larger  answer. 

In  this  same  issue  of  Science  magazine  there  is  a  discussion  by  Joy  and  Gage 
[JoGa85]  which  analyzes  the  information  flow  required  to  produce  color  movies.  Modest 
resolution,  slow  motion  requires  250  Kbytes/sec  while  high  resolution,  normal  motion 
requires  about  20  Mbytes/sec.  The  author  argues  that  color  movies  are  the  only  way  to 
really  assimilate  the  results  of  many  supercomputer  computations. 

I  estimate  that  a  3D  black  hole  model  giving  comparable  accuracy  would  have 
about  1.5  terabytes  in  the  answer.  This  would  produce  a  100  hour  movie  with  normal 
motion  and  modest  resolution. 

1995  APPLICATION:  Tank  battle  simulation.  In  this  hypothetical  application  we 
assume  there  are  six  tanks  and  the  study  focuses  on  the  weapons  system,  the  targeting 
system,  the  armor  and  the  defensive  systems.  Thus,  intensive,  detailed  computational 
analysis  is  made  of  a  special  event  such  as  a  shell  hit,  laser  strike  or  mine  explosion.  In 
one  of  these  special  events,  the  physics  is  followed  at  the  level  of  the  shell  explosion, 
shell  case  fragmentation  and  attempted  penetration  of  the  armor  by  blast  pressure  and 
heat.  Other  aspects  such  as  mechanics  of  the  tanks  or  terrain  is  simulated  at  a  much 
coarser  level. 

A  summary  of  the  computation  is  as  follows: 

Independent  variables:  3  space,  time,  input  of  6  tank  . 

drivers,  input  of  6  tank  gunners 
Dependent  variables:  tank  positions 

state  of  all  weapons  systems 
state  of  all  defensive  systems 
effects  of  all  special  events 
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Computation:  1  hour 

2  mega-giga  instructions  (2  nanosecond  cycle,  1000 
processors) 

700  MFLOPS  (200  teraFLOPS  machine) 

Answer:  A.  General  Scene:  200  by  200  by  50  grid 

B.  One  tank  geometry:  100  by  100  by  25 

C.  One  tank  weapon  system:  10,000  variables 

D.  One  tank  defensive  system:  10,000  variables 

E.  Tank  mechanics:  2000  variables 

F.  High  level  battle  scene:  5000  variables,  5000  time 
steps 

G.  Special  Events:  200  events  with 
200  x  100  x  100  x  200  grid 

The  size  of  the  answer  is  then  (in  megawords) 

A  +  6B  +  6C  +  6D  +  6E  +  F  +  G 
=  2  +  6(.25  +  .01  +  .01  +  .002)  +  25  +  200  *  400 

~  1,000,000 

Thus  the  size  of  this  answer  is  about  8  terabytes.  This  answer  could  be  shown,  in  full,  as 
a  color  movie  with  normal  motion  and  high  resolution  that  lasts  about  100-120  hours. 
We  visualize  that  the  study  of  this  application  would  involve  several  people  viewing  dif¬ 
ferent  parts  of  the  answer  over  a  period  of  time. 

We  now  pose  the  question:  Suppose  the  answer  has  been  computed  and  resides  in 
the  supercomputer  system,  how  long  will  it  take  to  move  the  answer  to  the  user’s  loca¬ 
tion?  We  use  the  effective  transfer  rates  from  Table  2  plus  the  size  of  answers  to  pro¬ 
duce  the  results  of  Table  3.  It  is  obvious  from  Table  3  that  systems  which  separate  the 
user  from  the  supercomputer  by  2  ethemets  and  a  VAX  are  totally  unable  to  provide  rea¬ 
sonable  service  for  many  supercomputer  applications.  Few  will  put  up  with  waiting  a 
week  to  see  the  results  of  a  30  minute  computation.  And  once  he  gets  the  answer 
“locally”  the  user  neither  has  a  place  to  put  it  nor  adequate  means  to  view  it. 


III.  SUPERENVIRONMENTS.  We  draw  three  conclusions  from  the  above  material: 

1.  Today’s  peripherals/workstations/networks  are  grossly  inadequate  even  for 
todays  supercomputations. 

2.  Today’s  programming  environments/languages  for  supercomputers  are  grossly 
inadequate,  even  antiquated. 

3.  Raw  computing  power  will  increase  dramatically  in  the  next  decade. 

The  peripheral/workstation/network  problem  is  not  easily  solved.  Fiber  optics  net¬ 
works  provide  a  great  deal  of  capacity  for  networks,  but  are  not  yet  very  cheap. 
Moderately  priced  terabit  memory  systems  are  not  now  on  the  horizon.  Workstations 
with  high  quality  color  graphics  and  movie  capabilities  are  quite  expensive  and  probably 
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Table  3.  Times  to  transfer  the  answers  of  the  three  applications  using 

various  facilities. 

Application 


Facility 

1983 

1985 

1995 

Telephone 

3  years 

9  years 

6  millennia 

9600  baud  line 

1  month 

3  months 

2  centuries 

ARPANET 

2  weeks 

7  weeks 

1  century 

VAX  1 1/780  bus 

2  days 

6  days 

7  years 

Ethernet 

5  hours 

15  hours 

16  months 

Cyber  205  channel 

4  min 

13  min 

1  week 

Run  time 

30  min 

1  hour 

!  hour 

will  remain  so  for  some  time.  For  the  next  decade  it  may  well  be  that  large  organizations 
will  have  a  few  superworkstations  which  are  shared  by  a  large  community  of  users. 

The  programming  environment/language  problem  has  many  technical  difficulties  to 
be  overcome,  but  the  initial  problem  is  simply  lack  of  effort.  The  software  support  for 
supercomputers  is  very  meager.  We  have  senior  scientists  and  engineers  using  facilities 
that  would  be  instantly  rejected  by  travel  agents,  junior  high  math  students,  secretaries 
and  the  general  public.  It  is  incredible  to  see  one  of  the  nation’s  scarest  human  resources 
wasted  due  to  the  lack  of  a  modest  investment  (compared  to  the  other  aspects  of  super¬ 
computing)  in  software  support. 

One  key  software  area  is  very  high  level  languages  appropriate  for  scientific  compu¬ 
tations.  Figures  2-4  show  three  examples  of  the  kind  of  things  we  should  expect  We  do 
not  discuss  the  ELLPACK  [RiBo85],  DEQSOL  [Ume83]  or  PROTRAN  [AiRi83]  sys¬ 
tems  in  any  detail  but  do  note  they  have  the  following  characteristics: 

1.  They  dramatically  improve  programming  productivity. 

2.  They  were  implemented  with  moderate  efforts  (2-4  man  years). 

3.  They  improve  execution  time  efficiency. 

Each  of  these  languages  has  short-commings  that  one  would  not  expect  in  production 
quality  systems  for  supercomputers,  yet  they  represent  a  great  advance  over  the  software 
currently  supplied  with  supercomputers. 

The  other  key  software  area  is  how  to  map  computations  onto  complex  supercom¬ 
puter  architectures  so  as  to  produce  high  efficiency  execution.  This  is  a  challenging 
technical  problem  where  many  approaches  are  being  actively  pursued.  However,  there  is 
still  little  indication  that  the  existing  and  future  techniques  will  be  embodied  into  good 
user-oriented  tools  or  systems. 

We  close  with  Figure  5  which  shows  the  schematic  of  a  superworkstation  which  is 
appropriate  for  supercomputers.  It  will  cost  10-50  times  as  much  as  the  current  good 
workstation.  It  needs  supersoftware  that  will  also  cost  4-50  times  as  much  as  current 
software  for  supercomputers.  The  superworkstation  and  supersoftware  are  equally 
important,  but  the  total  investment  for  the  software  will  be  an  order  of  magnitude  less. 
Then  one  might  have  the  superenvironment  to  take  full  advantage  of  the  supercomputer 
power  that  will  appear  in  the  next  decade. 
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*  THE  PLATEAU  PROBLEM 

EQUATION.  ( 1 .  +UY (X, Y) **2 )  UXX  +  (l.+UX(X,  Y)  **2)  UYY 

-  2.*UX(X,Y) *UY(X,Y)  UXY 
+  2.*(UX(X,Y)*UYY(X,Y)  -  UY (X, Y) *UXY (X, Y) )  UX 
+  2 . * (UY (X, Y) *UXX (X; Y)  -  UX (X,  Y)  *UXY  (X,  Y)  )  UY 
-  2 . * (UX(X, Y) *UYY (X, Y)  -  UY (X, Y) *UXY (X, Y) ) *UX (X, Y) 

+  2.*(UY(X,Y)*UXX(X,Y)  -  UX(X, Y) *UXY(X, Y) ) *UY(X, Y) 

* 

BOUNDARY.  U  «  BOUND(X,Y)  ON  Y  -  0.0  $  U  -  BOUND(X,Y)  ON  Y  =  1.0 
U  -  BOUND(X,Y)  ON  X  =  1.0  $  U  -  BOUND(X,Y)  ON  X  =  0.0 

* 

GRID.  5  X  POINTS  $  5  Y  POINTS 

TRIPLE.  SET  (  U  -  ZERO  ) 

fORTRAN. 

DO  100  IT  -  1,5 

DISCRETIZATION.  HERMITE  COLLOCATION 

SOLUTION .  BAND  GE 

FORTRAN. 

100  CONTINUE 

OUTPUT.  PLOT  (U) 

SUBPROGRAMS . 

FUNCTION  BOUND (X,Y) 

BOUND  -  SIN  (X+AMAX1  ( .  66 ,  .  1+Y**2)  )  *EXP  (X-Y) 

RETURN 

END 

END. 

Figure  2.  An  ELLPACK  program  that  solves  the  Platean  problem  (the  soap  film 
problem)  using  Newton  iteration  combined  with  Hermite-cubic  collocation 
for  the  linearized  problem. 


PARAMETER  (  N  -  16  ) 

REAL  MATRIX  HILBERT (N, N) ,  X(N,4),  B(N,4),  RESID(N,4) 

REAL  VECTOR  RNORM ( 4 ) 

CREATE  HILBERT  MATRIX 
ASSIGN  HILBERT (I, J)  -  1/(I+J-1.) 

DEFINE  HILBERT  MATRIX f  FIRST  3  RIGHT  SIDES 
ASSIGN  B(I, 1)  -  0.0 

B(I| 2)  -  1.0 

B(I,3)  -  1.0  +  . 01*SIN (100. *1) 

B(l, 1)  -  1.0 

COMPUTE  4TH  SIDE  TO  MAKE  SOLUTION  -1. 

DO  20  I  -  1,N 

20  SUM  HILBERT  (I ,J)  ;  FOR  (J-1,N)  ?  IS  B(I,4) 

SOLVE  THE  4  SYSTEMS  AND  COMPUTE  NORM  OF  RESIDUALS 
LINSYS  HILBERT*X  -  B  ;  HIGHACCURACY  ;  SAVE  HILBERT 
PRINT  B,X 

ASSIGN  RESID  «  HILBERT*X  -  B 

RNORM  -  RESID' *RESID 
RNORM (K)  -  SQRT  ( RNORM (K) ) 

PRINT  RNORM 
END 


Figure  3.  A  PROTRAN  program  that  creates  a  Hilbert  matrix  and  four  right  sides,  solves 
these  linear  systems  and  prints  the  least  squares  residual  for  each  system. 


C 

C 

c 

c 
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tf»  B?  S' 


VAR  TT? 


DOM  X  -  [0:1], 

y  -  [0:1.2]  ; 

TDOM  t  -  [0:1] 

MESH  X  -  [0.0:1. 0:0. 2]  , 

y  °  [0.0:l.2:0.2]  , 

t  «  [0.0:1.0:0.02]  ? 

CONST  A  a  0.62  ? 

REGION  R  «  [*,*]  , 

L  =  [0,*0  , 

R1  -  [1,*]  , 

D  «  [*,0]  , 

U  =  [*,1.2]  ? 

EQU  dt [TT]  a  A*[lapl  [TT]  ]  ; 

INIT  TT  =  100  AT  R  ; 

BOUND  TT  a  200  AT  D, 

TT  =  200  AT  U, 

dx [ TT ]  a  o  AT  L+Rl; 


SCHEME  ; 
ITER 

PRINT 
DISP 
END  ITER 
END  SCHEME 
END  ? 


NT  UNTIL  NT  GT  4  ? 

TT<+l>=TT+DLT*A*lapl  [TT] 
TT  AT  R  ? 

TT  AT  R  ? 


Figure  4.  A  DEQSOL  program  that  solves  a  time  dependent  partial  differential  equation. 

The  solution  obtained  this  way  ran  three  times  as  fast  as  the  same  method 
implemented  in  ordinary  Fortran  and  then  hand  optimized  for  vector  speeds. 
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Figure  5.  Schematic  diagram  of  a  superworkstation  appropriate  for  the  supercomputers 
of  the  1990’s. 
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